This paper introduces the novel problem of ‘claim-based queries’ and how digital libraries can be enabled to solve it. Claim-based queries need the identification of a key aspect of research papers: claims. Today, claims are hidden in its unstructured, free text representation within research documents. In this work, a claim is a sentence that constitutes the main contribution of a paper and expresses an association between entities of particular interest in a given domain. In the following, we investigate how to identify claims for subsequent extraction in an unsupervised fashion by a novel integration of neural word embedding representations of claims with a graph based algorithm. For evaluation purposes, we focus on the medical domain: all experiments are based on a real-world corpus from PubMed, where both, limitations and success of our solution can realistically be assessed.
|