Today Linked Open Data is a central trend in information provisioning. Data is collected in distributed data stores, individually curated with high quality, and made available over the Web for a wide variety of Web applications providing their own business logic for data utilization. Thus, the key promise of Linked Open Data is to provide a holistic view for a wide range of data items or entities. But parallel to the problems of database integration or schema matching, linking data over several sources remains a challenge and is currently severely hampering the vision of a working Semantic Web. One possible solution are instance matching systems that automatically create owl:sameAs links between data stores. According to existing benchmarks, the matching quality has even reached a satisfying level. However, our extensive analysis shows that instance matching systems are not yet ready for large-scale data interlinking. This is because query processors joining even via a single incorrectly created link implicitly use also all transitive owl:sameAs links that may in turn be mismatched again. The result is similar to the game Chinese Whispers: watered-down sameAs semantics step-by-step lead to a terrible end-to-end quality of joins. We develop innovative structural mechanisms on top of instance matching systems to significantly improve query processing avoiding Chinese Whispers.
|