PPPred: Classifying protein-phenotype co-mentions extracted from biomedical literature

Published in Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 2019

Download paper here

The MEDLINE database provides an extensive source of scientific articles and heterogeneous biomedical information in the form of unstructured text. One of the most important knowledge present within articles are the relations between human proteins and their phenotypes, which can stay hidden due to the exponential growth of publications. This has presented a range of opportunities for the development of computational methods to extract these biomedical relations from the articles. However, currently, no such method exists for the automated extraction of relations involving human proteins and human phenotype ontology (HPO) terms. In our previous work, we developed a comprehensive database composed of all co-mentions of proteins and phenotypes. In this study, we present a supervised machine learning approach called PPPred (Protein-Phenotype Predictor) for classifying the validity of a given…

Recommended citation: M. Pourreza Shahri, G. Reynolds, M. M. Roe, and I. Kahanda, PPPred: Classifying Protein-phenotype Co-mentions Extracted from Biomedical Literature”, Proceedings of the 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), Niagara Falls, NY, USA, 2019.”

Recommended citation: M. Pourreza Shahri, G. Reynolds, M. M. Roe, and I. Kahanda, PPPred: Classifying Protein-phenotype Co-mentions Extracted from Biomedical Literature", Proceedings of the 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), Niagara Falls, NY, USA, 2019."
Download Paper