DeepPPPred: Deep Ensemble Learning with Transformers, Recurrent and Convolutional Neural Networks for Human Protein-Phenotype Co-mention Classification

Document Type

Conference Proceeding

Publication Date



The extensive collection of biomedical literature is arguably the best source of knowledge and information on the latest scientific findings and fundamental problems for the biological and clinical communities. However, these articles contain unstructured text; therefore, this valuable knowledge may remain hidden without manual curation, which is tedious and time-consuming due to the rapid growth of publication. The relationships and associations between human proteins and phenotypic abnormalities associated with human disease are one such area of valuable knowledge. This situation calls for the development of accurate computational tools capable of automatically inferring these associations from text data, assisting human curators in expediting their triage and information extraction tasks. This work develops DeepPPPred, a deep ensemble learning model for protein-phenotype co-mention classification at the sentence level. In particular, DeepPPPred combines Support Vector Machines, Transformer models, Recurrent Neural Networks, and Convolutional Neural Networks via stacking. Our experimental results obtained using a manually curated gold-standard dataset demonstrate that DeepPPPred can provide state-of-the-art performance while outperforming all its competitors. This is the first study that develops deep learning models for the problem of classifying human protein-phenotype co-mentions. Our findings have implications for the biological and clinical communities and text mining and natural language processing developers working on biomedical relation extraction.

Publication Title

Proceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021

First Page


Last Page


Digital Object Identifier (DOI)