BioSGAN: Protein-phenotype Co-mention classification using semi-supervised generative adversarial networks

Document Type

Conference Proceeding

Publication Date

6-1-2021

Abstract

Valuable and relevant information that relates human proteins with their phenotypes in biomedical literature stays hidden from biomedical scientists due to the rapid rise in biomedical publications. Previous studies that developed computational methods to extract this knowledge mostly rely on rule-based linguistic patterns and supervised machine learning approaches. In this work, we propose the use of generative adversarial networks to develop a novel method called BioSGAN for the protein-phenotype co-mention classification task. We demonstrate the potential associated with combining a small labeled dataset with vast unlabelled biomedical text data extracted from Medline abstracts and PubMed Central open Access full-text in a semi-supervised machine learning framework. Our method achieves state-of-the-art performance for classifying the validity of a given sentence-level co-mention of a human protein and phenotype by convincingly outperforming a traditional machine learning-based counterpart. These findings have implications for biocurators, researchers, and the text mining community involved with biomedical relation extraction.

Publication Title

Proceedings - IEEE Symposium on Computer-Based Medical Systems

Volume

2021-June

First Page

468

Last Page

473

Digital Object Identifier (DOI)

10.1109/CBMS52027.2021.00055

ISSN

10637125

ISBN

9781665441216

Share

COinS