Extracting Co-mention Features from Biomedical Literature for Automated Protein Phenotype Prediction using PHENOstruct

Published in 10th International Conference on Bioinformatics and Computational Biology (BiCoB) 2018, 2018

Download paper here

Human Phenotype Ontology (HPO) is a recently introduced standard vocabulary for describing diseaserelated phenotypic abnormalities in human. Since experimental determination of HPO categories for human proteins is a highly resource-consuming task, developing automated tools that can accurately predict HPO categories has gained interest recently. In our previous work, we developed PHENOstruct, an automated phenotype prediction tool that uses input features generated from heterogeneous data sources including standard bag-of-words features extracted from biomedical literature. In this work, we introduce novel co-mention features which are based on co-occurrences of protein names and HPO terms within a specified span of text. Our experimental results indicate that utilizing co-mentions significantly improves the overall performance and that the most effective span is the paragraph-level. This is the first study that uses a knowledge-based approach for generating literature features for the task of automated protein phenotype prediction. These findings have implications for practitioners interested in developing automated biocuration pipelines for phenotypes.

Recommended citation: M. Pourreza Shahri and I. Kahanda, Extracting Co-mention Features from Biomedical Literature for Automated Protein Phenotype Prediction using PHENOstruct”, 10th International Conference on Bioinformatics and Computational Biology (BiCOB), Las Vegas, NV, USA, 2018.”

Recommended citation: M. Pourreza Shahri and I. Kahanda, Extracting Co-mention Features from Biomedical Literature for Automated Protein Phenotype Prediction using PHENOstruct", 10th International Conference on Bioinformatics and Computational Biology (BiCOB), Las Vegas, NV, USA, 2018."
Download Paper