Capturing Concept Similarity with Knowledge Graphs

Tracking #: 3022-4236

This paper is currently under review
Filip Ilievski
Kartik Shenoy
Nicholas Klein
Hans Chalupsky
Pedro Szekely

Responsible editor: 
Harald Sack

Submission type: 
Full Paper
Robust estimation of concept similarity is crucial for a range of AI applications, like deduplication, recommendation, and entity linking. Rich and diverse knowledge in large knowledge graphs like Wikidata can be exploited for this purpose. In this paper, we study a wide range of representative similarity methods for Wikidata, organized into three categories, and leverage additional knowledge as a self-supervision signal through retrofitting. We measure the impact of retrofitting with subsets from Wikidata and ProBase, scored based on language models. Experiments on three benchmarks reveal that pairing language models with rich information performs best, whereas the impact of retrofitting is most positive on methods that originally do not consider comprehensive information. The performance of retrofitting depends on the source of knowledge and the edge weighting function. Meanwhile, creating evaluation benchmarks for contextual similarity in Wikidata remains a key challenge.
Full PDF Version: 
Under Review