Mining Semantic Subspaces to Express Discipline-Specific Similarities

TitleMining Semantic Subspaces to Express Discipline-Specific Similarities
Publication TypeConference Paper
Year of Publication2020
AuthorsWawrzinek, J., J. M. G. Pinto, and W. - T. Balke
Conference NameACM/IEEE Joint Conference on Digital Libraries (JCDL’20)
Date Published08/2020
Conference LocationXi'an, Shaanxi, China

Word embeddings enable state-of-the-art NLP workflows in important tasks including semantic similarity matching, NER, question answering, and document classification. Recently also the biomedical field started to use word embeddings to provide new access paths for a better understanding of pharmaceutical entities and their relationships, as well as to predict certain chemical properties. The central idea is to gain access to knowledge embedded, but not explicated in biomedical literature. However, a core challenge is the interpretability of the underlying embeddings model. Previous work has attempted to interpret the semantics of dimensions in word embeddings models to ease model interpretation when applied to semantic similarity task. To do so, the original embedding space is transformed to a sparse or a more condensed space, which then has to be interpreted in an exploratory (and hence time-consuming) fashion. However, little has been done to assess in real-time whether specific user-provided semantics are actually reflected in the original embedding space. We solve this problem by extracting a semantic subspace from large embedding spaces that better fits the query semantics defined by a user. Our method builds on least-angle regression to rank dimensions according to given semantics properly, i.e. to uncover a subspace to ease both interpretation and exploration of the embedding space. We compare our methodology to querying the original space as well as to several other recent approaches and show that our method consistently outperforms all competitors.

Wawrzinek85.pdf782.76 KB