Researchers often struggle to solve a common problem: how does one know whether a research hypothesis is worth investigating? Given the increasing number of research publications, it is complicated to guide such decisions. Previous work has shown how predicting generally emerging research topics can provide some help. Yet, in specialized scientific domains, only little is known about how to provide a service that allows users to ease the identification of scientific claims worth investigating. Scientific claims here means a natural language sentence that expresses a relationship between two entities. In particular, how one of them affects, manipulates, or causes the other entity. In this paper, we propose a data-driven approach aiming at filling this gap and empowering users at query level: given the results of a query, we deliver a characterization of clusters of the query results to discover the contextualization of scientific claims and the identification of those claims that may be worth more research efforts. To do so, we cluster documents with scientific claims that share the same context by leveraging co-clustering. After that, we characterize the clusters to annotate them. Our annotation focuses on two core aspects: controversy and diversity of claims in a given cluster. Controversy arises when two or more claims semantically contradict each other; diversity means the presence of different semantics of the claims that do not contradict each other but provide different insights expressed by some paper. To evaluate the benefits of our approach, we performed an extensive retrospective analysis on PubMed.
|