BioSGAN: Protein-phenotype Co-mention classification using semi-supervised generative adversarial networks
Document Type
Conference Proceeding
Publication Date
6-1-2021
Abstract
Valuable and relevant information that relates human proteins with their phenotypes in biomedical literature stays hidden from biomedical scientists due to the rapid rise in biomedical publications. Previous studies that developed computational methods to extract this knowledge mostly rely on rule-based linguistic patterns and supervised machine learning approaches. In this work, we propose the use of generative adversarial networks to develop a novel method called BioSGAN for the protein-phenotype co-mention classification task. We demonstrate the potential associated with combining a small labeled dataset with vast unlabelled biomedical text data extracted from Medline abstracts and PubMed Central open Access full-text in a semi-supervised machine learning framework. Our method achieves state-of-the-art performance for classifying the validity of a given sentence-level co-mention of a human protein and phenotype by convincingly outperforming a traditional machine learning-based counterpart. These findings have implications for biocurators, researchers, and the text mining community involved with biomedical relation extraction.
Publication Title
Proceedings - IEEE Symposium on Computer-Based Medical Systems
Volume
2021-June
First Page
468
Last Page
473
Digital Object Identifier (DOI)
10.1109/CBMS52027.2021.00055
ISSN
10637125
ISBN
9781665441216
Citation Information
F. Anokye and I. Kahanda, "BioSGAN: Protein-Phenotype Co-mention Classification Using Semi-Supervised Generative Adversarial Networks," 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS), 2021, pp. 468-473, doi: 10.1109/CBMS52027.2021.00055.