With the increasing amount of user-generated content such as scientific blogs, questioning-answering archives (Quora or Stack Overflow), and Wikipedia, the challenge to evaluate quality naturally arises. Previous work has shown the potential to evaluate automatically such content focusing on syntactic and pragmatic levels such as conciseness, organization, and readability. We push forward these efforts and focus on how to develop an intelligent service to ease the engagement of users in two semantic attributes: factual accuracy, e.g., whether facts are correct and validity, e.g., whether reliable sources support the content. To do so, we deploy a Deep Learning approach to learn citation categories from Wikipedia. Thus, we introduce an automatic mechanism that can accurately determine what specific citation category is needed to help users increase the value of their contribution at a semantic level. To that end, we automatically learn linguistic patterns from Wikipedia to support a broad range of fields. We extensively evaluated several machine learning models to learn from more than one million annotated sentences from the massive effort of Wikipedia contributors. We evaluate the performance of the different methods and present a profound analysis focusing on the balance accuracy achieved.
|