Putting Instance Matching to the Test. Is Instance Matching Ready for Reliable Data Linking?

One of the main principles of linked data is that the various data stores are extensively interlinked through owl:sameAs relations. Unfortunately, the cross-linkage is not as extensive as one would hope. To address this problem, in-stance matching systems automatically discovering owl:sameAs links based on entity similarity, have been proposed. According to the results obtained on ex-isting benchmarks, such systems seem to have reached a level of maturity. But the evaluation benchmarks miss out on some important characteristics encoun-tered in real-world data. To establish if instance matching systems are ready for real-world data interlinking, we analyzed the main challenges of instance matching. We built a representative data set that emphasizes on these challenges and evaluated the quality of instance matching systems on the example of a top performer from last year’s Instance Matching track organized yearly by the Ontology Alignment Evaluation Initiative (OAEI).

To encourage further research on this topic, we made the data set we prepared, as well as the data generated by all our experiments publicly available at on this page:

Useful links:
The Virtuoso SWDB is accessible at http://lod.openlinksw.com/
Instance Matching OAEI 2013: http://www.instancematching.org/oaei/imei2013/results.html
SLINT+: http://ri-www.nii.ac.jp/SLINT/index.html

instance_matching_90K.zip67.38 MB