Demystifying the Semantics of Relevant Objects in Scholarly Collections: A Probabilistic Approach

TitleDemystifying the Semantics of Relevant Objects in Scholarly Collections: A Probabilistic Approach
Publication TypeConference Paper
Year of Publication2015
AuthorsPinto, J. M. G., and W. - T. Balke
Conference NameACM/IEEE Joint Conference on Digital Libraries (JCDL)
Date Published06/2015
Conference LocationKnoxville, TN, USA

Efforts to make highly specialized knowledge accessible through scientific digital libraries need to go beyond mere bibliographic metadata, since here information search is mostly entity-centric. Previous work has realized this trend and developed different methods to recognize and (to some degree even automatically) annotate several important types of entities: genes and proteins, chemical structures and molecules, or drug names to name but a few. Moreover, such entities are often cross- referenced with entries in curated databases. However, several questions still remain to be answered: Given a scientific discipline what are the important entities? How can they be automatically identified? Are really all of them relevant, i.e. do all of them carry deeper semantics for assessing a publication? How can they be represented, described, and subsequently annotated? How can they be used for search tasks? In this work we focus on answering some of these questions. We claim that to bring the use of scientific digital libraries to the next level we must find treat topic-specific entities as first class citizens and deeply integrate their semantics into the search process. To support this we propose a novel probabilistic approach that not only successfully provides a solution to the integration problem, but also demonstrates how to leverage the knowledge encoded in entities and provide insights to explore the use of our approach in different scenarios. Finally, we show how our results can benefit information providers.

jcdl_2015_jmgp_wtb.pdf1.36 MB