Analysis of the Performance of Representation Learning Methods for Entity Alignment: Benchmark vs. Real-world Data

Tracking #: 3775-4989

This paper is currently under review
Authors: 
Ensiyeh Raoufi
Bill Gates Happi Happi
Pierre Larmande
Francois Scharffe
Konstantin Todorov

Responsible editor: 
Guest Editors OM-ML 2024

Submission type: 
Full Paper
Abstract: 
Representation learning for Entity Alignment (EA) aims to map, across two Knowledge Graphs (KG), distinct entities that correspond to the same real-world object using an embedding space. Hence, the similarity of the learned entity embeddings serves as a proxy for that of the actual entities. Although many embedding-based models show very good performance on established synthetic benchmark datasets, in this paper we demonstrate that benchmark overfitting limits the applicability of these methods in real-world scenarios, where we deal with highly heterogeneous, incomplete, and domain-specific data. While there have been efforts to employ sampling algorithms to generate benchmark datasets reflecting as much as possible real-world scenarios, there is still a lack of comprehensive analysis and comparison between the performance of methods on synthetic benchmark and original real-world heterogeneous datasets. In addition, most existing models report their performance by excluding from the alignment candidate search space entities that are not part of the validation data. This under-represents the knowledge and the data contained in the KGs, limiting the ability of these models to find new alignments in large-scale KGs. We analyze models with competitive performance on widely used synthetic benchmark datasets, such as the cross-lingual DBP15K. We compare the performance of the selected models on real-world heterogeneous datasets beyond DBP15K and we show that most of the current approaches are not effectively capable of discovering mappings between entities in the real world, due to the above-mentioned drawbacks. We compare the utilized methods from different aspects and measure joint semantic similarity and profiling properties of the KGs to explain the models' performance drop on real-world datasets. Furthermore, we show how tuning the EA models by restricting the search space only to validation data affects the models' performance and causes them to face generalization issues. By addressing practical challenges in applying EA models to heterogeneous datasets and providing valuable insights for future research, we signal the need for more robust solutions in real-world applications.
Full PDF Version: 
Tags: 
Under Review