The Concepts of the Crowd: Mining Perceptual Attributes from Rating Data on the Social Web

TitleThe Concepts of the Crowd: Mining Perceptual Attributes from Rating Data on the Social Web
Publication TypeThesis
Year of Publication2012
AuthorsSelke, J.
Academic DepartmentCarl-Friedrich-Gauß-Fakultät
UniversityTechnische Universität Braunschweig
Thesis TypeDoctoral Thesis

With its huge amount of information, the World Wide Web mirrors all aspects of our everyday life. During the last years, various efforts have been undertaken to refine all the unstructured information into a structured form. In particular, this would enable relational database systems to process the information available on theWeb in a well-proven fashion, thus enabling database users to use this information efficiently.

Particularly challenging is the discovery of so-called perceptual concepts. In contrast to “hard” facts, which currently are extracted mostly by analyzing textual information, perceptual concepts primarily concern the common perception of people. This perception is characterized primarily by the fact that many properties of real-world objects cannot be described easily in explicit form. Examples of perceptual concepts are the “sportiness” of a car, the “suspense” of a movie, and the “creativity” of a restaurant.

This cumulative doctoral thesis develops one of the first approaches to extracting an entity’s perceptional properties by using the SocialWeb’s data. The focus of this thesis are ratings of the type “user X rates item Y a Z out of 10,” which meanwhile can be found on a variety of Web sites. By performing an extensive analysis of the rating behavior of users, the doctoral thesis demonstrates that those groups of objects can automatically be identified that are perceived by users as being similar regarding one or more perceptual properties. When linking the so-created perceptual spaces (an abstract model to describe perceptual similarity) with external reference information, detailed structured descriptions of objects can automatically be derived and then used in database systems.

This work performs a detailed investigation of the proposed methods for analyzing rating data and creates innovative application scenarios from them. Particularly important is a combination of the methods developed and a novel technique from the area of crowdsourcing. Among other things, it is shown that crowd-enabled databases, which have been invented recently, can massively benefit from the approach developed in this thesis: The application of crowdsourcing gets easier, data quality increases, and costs are reduced significantly.

In total, this cumulative doctoral thesis is based on eight peer-reviewed publications.

dissertation-selke.pdf6.19 MB