Result Set Diversification in Digital Libraries through the Use of Paper’s Claims

TitleResult Set Diversification in Digital Libraries through the Use of Paper’s Claims
Publication TypeConference Paper
Year of Publication2017
AuthorsPinto, J. M. G., and W. - T. Balke
Conference NameThe 19th International Conference on Asia-Pacific Digital Libraries (ICADL 2017)
Conference LocationBangkok, Thailand

Understanding the possible associations between two entities from a query is a hard problem. For instance, querying “coffee” and “cancer” even in a curated Digital Library is a challenge to the retrieval system that struggles to figure out the intention of the query. Maybe the user wants a consensus of what it is known? But how many different associations exist? How to find them all? Herein we introduce an approach to diversify the results retrieved from such queries aiming at re-ranking the result list. Our re-ranking models specifically one fundamental aspect of scientific papers: claims. Claims are the sentences that scientists use to report findings. In particular, we study claims that express associations between entities in the medical domain. More specifically, we focus on queries that involve two entities in which one of the entities has some effect on a disease. Thus, we work on a corpus obtained by querying PubMed to empirically assess our proposed solution. Moreover, we promote the idea of claims as an explicit key aspect to consider diversification in the result set of a query. We show the potential of our approach to ease the process of discovering representative associations between entities. Our approach relies on a representation of claims using neural embedding of word vectors and implements an algorithm to perform the re-ranking of the result set of a query. We empirically show the potential of our approach.

diversification.pdf403.19 KB