CAFE: Fact Checking in Knowledge Graphs using Neighborhood-Aware Features

Tracking #: 2449-3663

Agustin Borrego
Daniel Ayala
inma hernandez
Carlos R. Rivero
David Ruiz

Responsible editor: 
Philippe Cudre-Mauroux

Submission type: 
Full Paper
Knowledge Graphs (KGs) currently contain a vast amount of structured information in the form of entities and relations. Because KGs are often constructed automatically by means of information extraction processes, they may miss information that was either not present in the original source or not successfully extracted. As a result, KGs might potentially lack useful and valuable information. Current approaches that aim to complete missing information in KGs either have a dependence on embedded representations, which hinders their scalability and applicability to different KGs; or are based on long random paths that may not cover relevant information by mere chance, since exhaustively analyzing all possible paths of a large length between entities is very time-consuming. In this paper, we present an approach to completing KGs based on evaluating candidate triples using a novel set of features, which exploits the highly relational nature of KGs by analyzing the entities and relations surrounding any given pair of entities. Our results show that our proposal is able to identify correct triples with a higher effectiveness than other state-of-the-art approaches (up to 60% higher precision or 20% higher recall in some datasets).
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 31/May/2020
Major Revision
Review Comment:

The paper tackles the problem of knowledge graph (KG) completion using internal features, i.e., features that are computed using only the KG itself.
Specifically, the authors propose an approach (called CAFE) that evaluates candidate triples for KG completion using a set of neighborhood-aware features, i.e., features that consider the entities and relations surrounding any given pair of entities. This feature set is used to transform triples in the KG into feature vectors, which are then fed to neural models for predicting which of the candidate triples are correct and should be added to the KG.
The evaluation results over 4 ground truth datasets showed that the proposed method outperforms (on average) 6 state of the art approaches.

The paper is in general easy to read, well written, technically sound, and tackles a very interesting problem.

The major issue of the paper concerns the selection of the baselines. In the related work section, the authors state that the proposed method falls under the category of works that are path-based. However, they are not compared to any such related work like [17], [24], [32] and [33]. Also, they are not compared to more recent works that are based on embeddings, like [27] and [28], which can be also considered state of the art works. Is there a specific reason that hinders you from comparing your method with such more relevant and more recent works?

With respect to the evaluation results, precision is in general very low (<50%) for a large number of relations. Given that a knowledge graph should contain true facts, i.e. "knowledge", I'm wondering about the usefulness and practical application of the proposed method and of the considered baselines. Why completing a KG using such methods if you know that half of the new triples are not correct?

It is not clear to me why the proposed method "can be applied at any time as a KG grows with new entities and relations, without the need of a complete recomputation in opposition to embedding-based approaches" (this claim is repeated many times in the paper). If new entities and relations are added in the KG, then all feature values change which means that new neural models need to be trained and evaluated for each relation based on the new feature vectors. Otherwise performance might be low. Isn't this true? What is different in your method compared to other works that rely on embedded representations? Couldn't they also make use of the same embeddings when new data are added in the KG?

Given that this is a journal paper, I would expect to see a more detailed evaluation, like a detailed error analysis as well as a feature analysis which demonstrates the importance of each feature group. For example, I expect that the feature groups f1, f2, f5 and f6 are not important (I do see the intuition of using them; more below).

About the title: I find it a bit misleading for two reasons:
i) the paper is actually about KG completion using internal features, not just checking the validity of triples. So, I would expect to see "KG completion" in the title. By reading the title as it is now, I though that the paper is about a method to validate the existing triples of a KG.
ii) "Fact checking" is highly used in the context of fake news and refers to verifying information in non-fictional text in order to determine its veracity and correctness (like claims that are fact-checked by PolitiFact, Snopes, etc.). Although I know that this term has been used in KG-related works, I would suggest the author to use a different term, e.g., fact verification.

Other comments:
- Abstract: "highly relational nature of KGs" (mentioned also in the RW section)--> This is not clear to me. What is this "relational nature" of a KG?
- Section 1, last paragraph --> *CAFEuses*
- Section 2: I would like to see a distinction of the related works in terms of the use of internal or external features.
- Section 2: I would like to see a paragraph at the end explaining the similarities and differences of the proposed method compared to the mentioned previous works, in terms of model used, similarity of considered features, etc. What features do the related work make use of?
- Section 3.1: The work seems to ignore literal properties which, though, are very common in all KGs, like properties pointing to dates, numbers, strings, etc. Is this true, or with "entity" you also mean dates, numbers, etc.?
- Section 3.2: I would like to see the intuition behind some of the feature groups, in particular features groups f1, f2, f5 and f6. For example, what is the intuition for considering the number of entities in the neighborhood subgraph of the source entity? How can this number help in deciding if a triple should be included in the KG?
- Section 3.2 - Feature group f5: "f5(2, hasPrequel)" --> "f5(hasPrequel, 2)" or "f5(r, n)" --> "f5(n, r)" ? (the same for feature group f6)
- Section 4.5: "we first remove any individual features that have the exact same value..." --> Which ones did you remove in your experiments?
- Section 5.1: "Relations that make up for less than 5% of the total amount of triples in the graph have been removed." --> Why? What amount of relations and triples were removed?

Review #2
Anonymous submitted on 23/Jun/2020
Review Comment:

This paper presents CAFE, a path based approach for fact-checking in knowledge graphs. To predict whether a fact (triple) exists a neural binary classifier is used which is trained on a novel set of features proposed by the authors. This feature set captures the information about the neighborhood subgraphs of the source and target entities as well as the paths between them.

The paper is well structured and contains many illustrative examples, which makes it easy to follow. The proposed approach has certain advantages over embedding-based and random walks based methods. On the one hand, CAFE relies solely on local subgraphs and thus does not require to retrain the whole model whenever new triples are added to the graph. On the other hand, the model is completely deterministic as it considers all the relevant information on each iteration of training. Another advantage of this method is that it can be easily applied to any knowledge graph and does not require complex preprocessing.

The novelty of this paper is very limited. Leveraging subgraph features for fact-checking has been proposed by Gardner and Mitchell in ”Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction”, exploiting paths between entity pairs as training features is also an established technique. The overall approach does not demonstrate a big improvement over the state-of-the-art.

The algorithm itself and its comparison methodology against other approaches raise several serious concerns.
First of all, it is not clear from the paper why separate classifiers are trained for each relation. This reduces the size of the training set by the factor of ten and does not seem to bring any additional benefits. Moreover, training other models such as TransE and ComplEx on such reduced datasets will result in suboptimal performance and thus the comparison is not fair in this set-up.
Second, generating additional negative examples for the test sets results in different distributions of the training and test sets. Even assuming that the presented model is resilient to the mismatched data distribution (this fact is not mentioned in the paper though), other models against which CAFE is compared are most probably not, and this can also result in poor performance.
The reported results (figure 6) are very inconsistent: the same version of the algorithm demonstrates vastly different performance across different types of relations and datasets. The evaluation results of one of the baseline algorithms – ComlEx – look surprisingly bad: all metrics are close to zero across all datasets. This might indicate that the chosen evaluation protocol is not suitable for this kind of task.
Finally, some conclusions drawn from the results seem to be overgeneralized. For example, in section 5.3 the authors state that “larger neighborhood subgraphs do not provide additional value or predictive power over smaller ones” after comparing subgraphs of size 1, 2, and 3, which is not sufficient to sustain this finding.

The paper contains typos and grammatical mistakes and should be proofread by a native speaker.

Review #3
Anonymous submitted on 03/Jul/2020
Major Revision
Review Comment:

The CAFE approach proposes a novel algorithm for verifying the truthfulness of relations (facts) in knowledge bases based on a set of hand-crafted features that are fed into relation prediction neural models. These features are based on the notion of vicinity of entities in knowledge graphs and their similarities, arguing that these representations outperform state of the art embeddings-based approaches (in particular that they can account for dynamic modifications of the underlying data).

The paper is overall well-written, although a proofreading for English is recommended (see some of the minor remarks below). The approach is clearly presented and appears to be convincing in terms of performance while reading the evaluation section which looks sound and accurate. Below are my comments.

I would have liked to see a more in-depth discussion on the posititioning and originality of the approach as compared to other hand-crafted features-based approaches (the authors only talk about path-based approaches). This will help to more clearly understand and assess the originality of the proposed method.

Certain claims need further explanation and detail. For example, « embedded representations, which hinders their scalability and applicability to different KGs; »: this statement is rather general and needs proof or reference - certain embeddings models can be transferred successfully to other application fields and data than their original ones. Also, why random walks would « miss relevant information because of their indeterministic nature »?

Regarding KG construction (line 41), it is worth mentioning methods that rely on a « human-in-the-loop » kind of approaches.

There is a confusion in the narrative between missing and accurate information. As the authors underline, the OWA implies that those two things are different, but they are being nonetheless constantly equivalenced (see e.g. line 38). After reading through the introduction, it becomes clear that the paper talks about KG completion, while the approach description illustrates that one is dealing with correcting errors. Please provide clear details on that issue and the precise task throughout the text.

In that respect, I would not use the term 'fact checking' (in the title and elsewhere), which in this context appears somewhat misleading (although used sometimes for KGs, the more common reference is that of claim veracity verification in social nets or news outlets). Line 37, please correct the statement: « The task of creating, training and applying such a model is known as fact checking [21, 22]. »

I appreciate the author’s effort to formally lay the grounds of their approach by providing definitions and illustrating them with examples. However, I don’t think that for a paper to potentially appear in SWJ one needs extensive and detailed definitions (even accompanied by examples) of basic notions, such as triples and KGs.

The evaluation is performed in two blocks: comparing the different versions of CAFE against one another and comparing CAFE against state of the art (mostly embeddings-based) approaches. The reported results show that CAFE compares reasonably to existing neural embedding methods and in some cases it outperforms them in terms of standard metrics.

The authors mention as a limitation the computational cost of the approach. Since this is also a limitation for many of the sota approaches that are included in the evaluation, it would be interesting to report on how CAFE compares to those methods with respect to this parameter.

I appreciate the fact that the authors make all datasets and source codes available publicly.

Minor comments:

- The statement: « Knowledge Graphs (KGs) are vast repositories of structured information whose growth over the past »years can be related to that of the Web of Data » is vague and unclear.
- « said information is then semantized » (also elsewhere the use of « said ») - check the English, what is meant here probably is « the abovementioned » or the like
- Finally, those that are path-based —> Finally, path-based techniques
- I would not use a JK Rowling-related example, which somewhat promotes her work, while she’s currently involved in a homophobia related scandal.
- We define a triple as a 3-tuple — there must be a more elegant way of stating that
- Our proposal, CAFE, receives a KG — that sounds a bit strange (our system?)
- for CAFE1 through CAFE3 —> for CAFE1, CAFE2 and CAFE3