Automatically cataloging scholarly articles using library of congress subject headings

Document Type

Conference Proceeding

Publication Date

1-1-2021

Abstract

Institutes are required to catalog their articles with proper subject headings so that the users can easily retrieve relevant articles from the institutional repositories. However, due to the rate of proliferation of the number of articles in these repositories, it is becoming a challenge to manually catalog the newly added articles at the same pace. To address this challenge, we explore the feasibility of automatically annotating articles with Library of Congress Subject Headings (LCSH). We first use web scraping to extract keywords for a collection of articles from the Repository Analytics and Metrics Portal (RAMP). Then, we map these keywords to LCSH names for developing a gold-standard dataset. As a case study, using the subset of Biology-related LCSH concepts, we develop predictive models by formulating this task as a multi-label classification problem. Our experimental results demonstrate the viability of this approach for predicting LCSH for scholarly articles.

Publication Title

EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Student Research Workshop

First Page

43

Last Page

49

ISBN

9781954085046

This document is currently not available here.

Share

COinS