Exposing the Hidden Web for Chemical Digital Libraries

TitleExposing the Hidden Web for Chemical Digital Libraries
Publication TypeConference Paper
Year of Publication2010
AuthorsTönnies, S., B. Köhncke, O. Koepler, and W. - T. Balke
Conference Name10th ACM/IEEE Joint Conference on Digital Libraries (JCDL)
Date Published06/2010
Conference LocationSurfers Paradise, Gold Coast, Australia

In recent years, the vast amount of digitally available content has lead to the creation of many topic-centered digital libraries. Also in the domain of chemistry more and more digital collections are available, but the complex query formulation still hampers their intuitive adoption. This is because information seeking in chemi-cal documents is focused on chemical entities, for which current standard search relies on complex structures which are hard to extract from documents. Moreover, although simple keyword searches would often be sufficient, current collections simply cannot be indexed by Web search providers due to the ambiguity of chemical substance names. In this paper we present a frame-work for automatically generating metadata-enriched index pages for all documents in a given chemical collection. All information is then linked to the respective documents and thus provides an easy to crawl metadata repository promising to open up digital chemical libraries. Our experiments, indexing an open access journal, show that not only the documents can be found using a simple Google search via the automatically created index pages, but also that the quality of the search is much more efficient than fulltext indexing in terms of both precision/recall and perfor-mance. Finally, we compare our indexing against a classical struc-ture search and figured out that keyword-based search can indeed solve at least some of the daily tasks in chemical workflows. To use our framework thus promises to expose a large part of the currently still hidden chemical Web, making the techniques em-ployed interesting for chemical information providers like digital libraries and open access journals.

jcdl13_toennies.pdf633.69 KB