Review Comment:
In this article, the authors introduce the tasks of weight-aware link prediction and triple classification for knowledge graph completion. They consider triples associated with a weight ((s,p,o), w) and adapt the task of Link Prediction (LP) and Triple Classification (TP) to consider these weights in the evaluation, by introducing weight-aware metrics: WaMR, WaMRR, WaHits@N, and WaF1. They also introduce a framework extending knowledge graph embedding to consider weights during the training of models. It is noteworthy that this framework seems general (see below) and could potentially be applied to several models, which the authors show in their experimental evaluation that considers TransE, TransH, DistMult, and ComplEx.
Overall, I appreciate the idea of weight-aware LP and TC as well as extending metrics and training frameworks in this objective. However, I think the paper in its current form misses fundamental explanations for the reader to clearly understand the implication of the proposal. I also have several theoretical doubts about the proposal. That is why I recommend a major revision.
# Major comments
- Section 2.2 focuses on FocusE, which is of importance for the paper as it is a direct competitor. However, the section does not provide enough details for the reader to have a clear intuition of the model. For example, the definition of negative triples l^- is not given. I also think equations should be further described, detailed, and exemplified. Furthermore, the conclusion of the descriptions of each paper 33, 34, 35, 36 is rather weak. I think the authors could provide here a clear positioning w.r.t. their proposal instead of a rather general statement.
- In Section 3, authors mention "weight-aware LP an weight-aware TC have been introduced". Do you mean they have been introduced by other authors before? Or do you mean that you introduce these tasks in this paper?
- Section 3.1.2: the role of the activation function g is not enough described here to understand its impact on the task at hand. Some explanations are provided in the experimental section but this is too late to clearly understand the contribution in 3.1.2 (e.g., Fig 5 could be here). Furthermore, I don't understand how the definition of r^w_i allows the models to focus on high weight triples. Indeed, with a linear g:
* A triple ranked 1 with a weight of 100 will be reranked 0.01. If it is ranked 100, it will be reranked 1
* A triple ranked 1 with a weight of 2 will be reranked 0.5. If it is ranked 100, it will be reranked 50
The error on triple with weight 100 is actually smoothed by the re-ranking. As such, I am missing the intuition of why this re-ranking makes models focus on high-weighted triple.
The constant u is not defined in later formulas. Additionally, the need for a normalization factor is not motivated. I also wonder whether it still makes sense to compute WaHits@N since ranks can now be real-valued numbers? That is why I think additional intuitions and examples should be given to describe the behavior of WaMR, WaMRR, and WaHits@N.
- Section 3.2.2: I have the same remarks about the lack of intuitions and examples as for Section 3.1.2. Additionally, why does the normalization factor is needed, and why is it used as a denominator here contrary to 3.1.2 where it is used as numerator?
- Section 3.3: the authors set the weight w' of negative triples as a hyper-parameter. I don't argue the difficulty to choose a weight for non-existent triples. However, I think this could be further discussed and explained.
Additionally, your work rely on the pairwise hinge loss. Could other losses be considered? Why choosing this one? How can this framework be applied to KGE models that do not rely on the PH loss? I also think here the intuitions behind this extension of the PH loss could be better exemplified.
I also have a theoretical doubt about this loss. Indeed, DistMult and ComplEx try to maximize the score the positive triples w.r.t. negative ones (if I am correct). Minimizing your loss comes down to minimizing the score of positive triples and maximizing the score of negative triples. I am thus curious about this point, whether it respects the original behavior of models and whether it could be applied to them, and so compared with them in the experimental section.
# Minor comments
- Several times (in abstract and in the text), you mention that "Link Prediction and Triple Classification are widely adopted to evaluate the performance of knowledge graph embeddings". I would argue that some models are specifically designed for LP and TC while some others are not (e.g., RDF2Vec). Evaluating the performance of KGE through LP and TC is only one possible way to go and do not represent a holistic evaluation of KGE, especially since other metrics are being introduced, e.g., Sem@K [1], CO2/energy consumption [2], explainability [3] etc.
- "Indeed, the traditional evaluation metrics, such as Link Prediction and Triple Classification": LP and TC are not metrics. Did you mean "tasks"?
- The subsection definition "1.0.1 Twofold contributions" could be removed, and paragraphs simply added to Section 1 - Introduction
- Incomplete sentence "The hyper-parameter \beta \in [0,1]."
- Section 2.3.3 title is "Tail Entity Prediction" is strange as it seems to present the task of link prediction in an uncertain KG.
- p6, l27, a comma is starting the line.
- Figure 3 is not commented and I am therefore not sure of its usefulness.
- Figure 5 is interesting but should come earlier in the paper. The various epochs considered for the dynamic base should appear on the figure to visually understand their impact.
- Section 4.3: what is "WeExt"?
- Table 2, 3, 4: Wa[MR, MRR, Hits@N] are not used but their traditional counterparts are. Why?
- I do not understand Figure 6. I think axes should be labeled.
# References
[1] Nicolas Hubert, Pierre Monnin, Armelle Brun, Davy Monticolo. Sem@K: Is my knowledge graph embedding model semantic-aware?
[2] Xutan Peng, Guanyi Chen, Chenghua Lin, and Mark Stevenson. Highly efficient knowledge graph embedding learning with Orthogonal Procrustes Analysis.
[3] Andrea Rossi, Donatella Firmani, Paolo Merialdo, and Tommaso Teofili. Explaining link prediction systems based on knowledge graph embeddings
|