Catching the Drift – Indexing Implicit Knowledge in Chemical Digital Libraries

TitleCatching the Drift – Indexing Implicit Knowledge in Chemical Digital Libraries
Publication TypeConference Paper
Year of Publication2012
AuthorsKöhncke, B., S. Tönnies, and W. - T. Balke
Conference NameInternational Conference on Theory and Practice of Digital Libraries (TPDL)
Date Published09/2012
Conference LocationPaphos, Cyprus

In the domain of chemistry the information gathering process is highly focused on chemical entities. But due to synonyms and different entity representations the indexing of chemical documents is a challenging process. Considering the field of drug design, the task is even more complex. Domain experts from this field are usually not interested in any chemical entity itself, but in representatives of some chemical class showing a specific reaction behavior. For describing such a reaction behavior of chemical entities the most interesting parts are their functional groups. The restriction of each chemical class is somehow also related to the entities’ reaction behavior, but further based on the chemist’s implicit knowledge. In this paper we present an approach dealing with this implicit knowledge by clustering chemical entities based on their functional groups. However, since such clusters are generally too unspecific, containing chemical entities from different chemical classes, we further divide them into sub-clusters using fingerprint based similarity measures. We analyze several uncorrelated fingerprint/similarity measure combinations and show that the most similar entities with respect to a query entity can be found in the respective sub-cluster. Furthermore, we use our approach for document retrieval introducing a new similarity measure based on Wikipedia categories. Our evaluation shows that the sub-clustering leads to suitable results enabling sophisticated document retrieval in chemical digital libraries.

koehncke_tpdl12.pdf696.04 KB