Open Information Extraction in Digital Libraries: Current Challenges and Open Research Questions

TitleOpen Information Extraction in Digital Libraries: Current Challenges and Open Research Questions
Publication TypeConference Paper
Year of Publication2021
AuthorsKroll, H., J. Al-Chaar, and W. - T. Balke
Conference Name1th International Workshop on Digital Infrastructures for Scholarly Content Objects (DISCO@JCDL2021)
Date Published09/2021
PublisherCEUR Proceedings, Vol-2976
Conference LocationUrbana-Champaign, IL, USA
Abstract

A central challenge for digital libraries is to provide effective access paths to ever-growing collections of mostly textual, i.e., unstructured information. The traditional, yet expensive way to manage, categorize, and annotate such collections is extensive manual metadata curation to semantically enrich library items. The ability to convert textual information automatically into a structured representation would be extremely beneficial, allowing for novel access paths as well as supporting semantically meaningful discovery. This paper investigates opportunities and challenges that the latest techniques for open information extraction offer for digital libraries. Open information extraction promises to work out-of-the-box and does not require domain-specific training data. To evaluate how well such tools perform, we perform a qualitative evaluation in two domains: general news and biomedicine. Our research shows current benefits, but also reveals serious challenges for practical applications. In particular three research questions still have to be solved to reliably use open information extraction in digital library projects.

AttachmentSize
DISCO2021_Kroll_OpenIE Challenges.pdf350.57 KB