Towards Explainable Automated Knolwedge Engineering with Human-in-the-loop

Tracking #: 3732-4946

Authors: 
Bohui Zhang
Albert Meroño-Peñuela
Elena Simperl

Responsible editor: 
Guest Editors KG Construction 2024

Submission type: 
Full Paper
Abstract: 
Knowledge graphs are important in human-centered AI as they provide large labeled machine learning datasets, enhance retrieval-augmented generation, and generate explanations. However, knowledge graph construction has evolved into a complex, semi-automatic process that increasingly relies on black-box deep learning models and heterogeneous data sources to scale. The knowledge graph lifecycle is not transparent, accountability is limited, and there are no accounts of, or indeed methods to determine, how fair a knowledge graph is in downstream applications. Knowledge graphs are thus at odds with AI regulation, for instance, the EU's AI Act, and with ongoing efforts elsewhere in AI to audit and debias data and algorithms. This paper reports on work towards designing explainable (XAI) knowledge-graph construction pipelines with humans in-the-loop and discusses research topics in this area. Our work is based on a systematic literature review, in which we study tasks in knowledge graph construction that are often automated, as well as common methods to explain how they work and their outcomes, and an interview study with 13 people from the knowledge engineering community. To analyze the related literature, we introduce use cases, their related goals for XAI methods in knowledge graph construction, and the gaps in each use case. To gain an understanding of the role of explainable models in practical scenarios, and reveal the requirements for improving the current XAI methods, we designed interview questions covering broad transparency and explainability topics, along with example discussion sessions using examples from the literature review. From practical knowledge engineering experience, we collect requirements for designing XAI methods, propose design blueprints, and outline directions for future research: (i) tasks in knowledge graph construction where manual input remains essential and where AI assistance could be beneficial; (ii) integrating XAI methods into established knowledge engineering practices to improve stakeholder experience; (iii) the need to evaluate how effective explanations genuinely are making human-machine collaboration in knowledge graph construction more trustworthy; (iv) adapting explanations for multiple use cases; and (v) verifying and applying the XAI design blueprint in practical settings.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 25/Sep/2024
Suggestion:
Minor Revision
Review Comment:

The paper addresses an interesting and timely topic, analyzing the role of XAI methods and techniques in KGC tasks with human intervention, while posing four RQs. The study is particularly relevant given the increasing automation and use of LLMs in the knowledge graph construction process, where common challenges must be addressed. To tackle this, the authors have designed a hybrid methodology that, mainly, combines a review of scientific literature and expert interviews.

My comments are as follows:

1. Review concepts such as transparency and explainability. Post-hoc methods will not provide transparency to a machine learning model such as XGBoost or LLMs.

2. The use of terms is not sufficiently clear. I would suggest using "models" when referring to academic solutions that involve machine learning, even if they are not related to explainability solutions. When referring to techniques such as LIME, I would refer to them as XAI techniques. If the model inherently uses techniques to address explainability, I would also specify this.

3. When introducing an acronym, please explain it (KGC aka Knowledge Graph Construction).

4. Figure 1. Knowledge Graph Construction is not the same as Ontology Engineering, so this task must be separated in Fig.1 or omitted.

5. Regarding the organization of the paper:

5.1. In the introduction, I would not anticipate the conclusions of the work; instead, I would introduce the sections of the paper and their relationship to the RQs.

5.2. In this regard, the work done is very thorough. The methodology consists of two basic activities: the literature review and the interviews. Therefore, these activities are linked to addressing the different RQs. I felt a figure was missing that links these phases and activities to the corresponding RQ responses. In this sense, the context is lost during the presentation of the interview results regarding the objectives they aim to address. Moreover, I suggest summarizing the conclusions obtained from the questionnaires. For example, a table with the main conclusions from the questionnaires, along with a brief explanation of how these responses impact the stated objectives. This way, the reading of the article could be easier.

5.3. I would consider moving some tables to appendices (for example, the interview design on page 11).

5.4. I would link the use cases to the specific task in the context of knowledge graph construction (table 6).

5.5 Section 4.1 should follow the structure of Figure 3, i.e., grouping XAI dimensions and tasks into sub-sections.

5.6. Section 4.4.2 is significant enough to be moved to a new Section 5. In this sense, your proposal can be viewed as a future guideline or methodology that leverages on many of the gaps found on the paper

Review #2
By Sitt Min Oo submitted on 26/Sep/2024
Suggestion:
Major Revision
Review Comment:

Decision: Major Revision

## High-level Review

I would like to thank the authors for this submission. The paper summarizes existing XAI techniques and conducted a survey to find possible applications of XAI techniques in the domain of knowledge graph engineering. It is an easy to digest paper for introducing XAI to the knowledge engineering community. The following are the feedbacks that I have for this submission

## General feedback

- There is a previous work by the author which covers 1/3 of the content in
this paper
[https://doi.org/10.3233/FAIA230091](https://doi.org/10.3233/FAIA230091)
word for word. If this journal is an extension upon this previous paper,
there is a need for clear indication in the introduction section: i)
clarifying what this work contributes on top of the previous work (the
delta), ii) that this work is a clear extension of the previous work and not
a standalone work.

- In my opinion, this paper could be evaluated better as a **survey paper**
rather than a full research paper. A majority of this work focuses on
studying a significant amount of related literatures (specifically other
systematic review papers on XAI's) to analyze them, summarize/categorize
them, derive new findings from those existing works and conduct an interview
to verify the findings. Otherwise, if evaluated as a full research paper,
this paper, as is, lacks the originality of a full research paper since it
is not clearly stated and the results of the interview study are
insignificant to push new boundaries, it just made the requirements for XAI
methods in knowledge graph engineering clearer. Thus, I cannot recommend
accepting this paper in its current form as a full research paper.

- In the methodology section for literature review, the use cases are defined
without any motivation nor citations on why these particular sets of use
cases are used for analysis of the collected literatures. How are these use
case derived from the literature? Are there any literatures to support these
use cases in terms of AI usage in knowledge graph constructions?

There are also a few problems with the provided use cases. Although the
following two problems are handled in the interview study methodology, they
are not mentioned in the literature review methodology. Firstly, I do not
see any mentions of regulations compatibility as part of the use case
analysis even though it was a significant focus in the problem context
paragraph of the introduction section. Is it possible to also consider
regulations compatibility as part of the analysis of the literatures?
Secondly, there is no mention of data provenance considerations in the use
cases, which is a significant part of the problem context described in the
introduction section (this is also related to the previous point about
regulations compatibility).

- For the pool of interview participants, it would be interesting if there
were more participants from the industry (currently, there are only 3
industry participants) leading to the results of the interview study leaning
more to the academia.

- Explanation Design (Figure 6) and Section 4.4.2 only gave details on the
requirement analysis for **users**, **use cases**, and **representations of
explanations**. A major component on **regulations** was just briefly
mentioned in the paper and **not discussed in depth** leaving it as just a
background context for the introduction section of the paper. Similarly,
"Evaluation" step is also left out from the in-depth discussion in Section
4.4.2.

Regarding the XAI design blueprint step on "Evaluation", are there
recommendations for the selection process on the type of metrics/dimensions
when evaluating the XAI models? If this is mentioned somewhere in the
"Findings" sections, it would be nice to reiterate the recommendations when
describing the proposed XAI design blueprint.

- It would also be nice to indicate where/which part of the findings answers
the 4 research questions mentioned in the introduction section.

- A few citations lack either a DOI or a URL to the paper. It would be nice to
have URLs to follow the citations.

## Detailed Review

### Introduction

**Page 2:**

- KG lifecycle --> what do you mean with KG lifecycle? The construction phase?
The usage of KG? The storage of KG? (Rereading it, it is fully introduced in
Background 2.4, but it would be nice to have a short sentence explaining
what it is in introduction)

- Most regulators take a risk-based approach to the use of AI ... are
compliant with the law (line 10-13) --> is there a citation to support these
two statements?

- Up-to-date comparative surveys... (line 20-21) --> this statement is very
disconnected from the rest of the paragraph on human-centric approach.
Remove it if it's not needed.

- ... we would like to advance the field of **explainable knowledge
engineering** (line 23-24) --> How? A sentence or two to show "how" would
strongly support this statement. If the "how" is development of
human-in-the-loop approaches for transparency and accountability, the
accompanying sentence needs to be rewritten/restructured for more clarity.

### Background

Page 4:

- Reviews and surveys ... from **end-users** have also become increasingly
common (line 11-12) --> Looks very out of place since the paragraph is
mostly focused on XAI without involvement of end-users. Would also need
citations if you decide to keep this statement.

Page 5:

- KGs are interacting with AI capabilities in complex ways (line 30) -->
How/What are the complex ways AI interacts with KG in the figure? From the
figure it looks pretty simple since the **input** for stage C, where I
assume most of the AI methods/models are, comes from stage D and as
**output**, it enriches the generated KG. Similarly, stage D also has a very
clear input/output direction.

- While KGs constructed using these approaches ... similar transparency
challenges as the algorithms it complements (line 45-47) --> Doesn't it mean
that crowdsourcing approach, in general, is a bad idea since it results in
biased, bad quality data while also suffering from transparency issues? This
sentence doesn't read well.

### Methodology

Page 9:

- Table 3's tasks is not aligned with the tasks mentioned in Stage B of Figure
1). Is this intentional? If yes, it would be nice to also have another
column with relevant tasks that are aligned with the ones provided in the
figure for knowledge graph construction stage of the KG lifecycle.

Page 9-10:

- 3.2.1 Interview questions section (2 paragraphs)

The first paragraph leads the reader through the interview process
step-by-step until the end where risk concerns are addressed. It reads well
and has an _order_ to it. However, the second paragraphs came in totally
disconnected talking about the "examples" selection, which I believe is for
the topics **Use Cases**, **XAI Example Discussion**, and **Requirements**
of the interview.

I think it would read better to make the second paragraph a separate
subsection titled "Examples and use cases selection from literature process"
and link the sentences "Inspired by \[55\], we designed ... concerns,
challenges, and requirements" (Page 9 line 45-47) to that section.

### Findings

#### SOTA study/review

Page 12:

- a human-in-the-loop system that complies ... (line 41) --> compiles (do you
mean compile?)

- For instance, NERO uses... (line 51) --> ... NERO \[++citation\] (citation
missing)

Page 13

- SIRE employs... (line 20) --> SIRE \[++citation\] (citation)

- Beyond NERO, LogiRE ...(line 24) --> LogiRE \[++citation\] (citation)

- Diverging from text-based explanations, ProtoRE ...(line 24) --> ... ProtoRE
\[++citation\] (citation)

- RULESYNTH, proposed by Singh et al.... (line 35) --> citations reference
link?

- Last sentence on _Entity Resolution_: Additionally, ... attempt to add them
to make non-matching pairs more similar. It took me a while to read this
sentence and understand it. If it is about entity pairs which are
different/non-matching, but **contextually** similar due to the input
attributes, this sentence needs to be rewritten to provide the clarity.

Page 14:

- which require feeding more data and extending training time. (line 31) -->
which require feeding more data **thus** extending training time (it reads
better this way?)

* The relationship between the complexity of functions and ... educate them
--> What kind of relationship? Complex functions + more freedom of
operations leads to lesser time required to educate the users?

- approxSemanticCrossE proposed explanation... target the link --> targetting?

#### Use cases and capabilities

Page 15:

- Among the use cases, three areas, ... (line 42-43) -> Which three areas?

- such as explainers designed for any knowledge... and some mode-specific
methods... -> such explainers designed for **both** any knowledge ... and ?

Review #3
By Irene Celino submitted on 07/Nov/2024
Suggestion:
Minor Revision
Review Comment:

In general, the paper is a very good contribution and it is very welcome now, because the topic of explainability and human-in-the-loop AI are very timely in knowledge graph construction (KGC).

With respect to the global evaluation criteria:
- Originality is pretty high, as I'm not aware of any other work that specifically address this area at the intersection of explainability and KGC
- Significance of results is also good: the analysis was carried out very professionally, both for the literature review and the interview study; the proposed "blueprint" is very interesting and valuable, even if it could be made stronger and more prominent in the paper, as detailed below
- Quality of writing is quite high as well, as the paper is quite easy and pleasant to read; there are some possible improvements to the narrative and structure of the paper, as detailed below

Some more detailed comments on different aspects of the paper are offered hereafter.

Paper title: it seems to me that the paper mostly refer to KGC tasks, rather than knowledge engineering in general; I’d recommend to reflect this also in the title, putting “knowledge graph construction” instead of “knowledge engineering”.

Concept of explanation: throughout the paper I felt the need to have a definition of what the authors mean by “explanation” and some more examples; while everybody has an intuitive understanding of what can constitute an explanation, I think that a definition and some more examples would highly improve the reader to correctly interpret the presented work. This definition can either be placed at the beginning to clarify the scope of the paper or be offered at the end as part of the blueprint to clarify how an explanation should look like. It would also be great to add an example of explanation for each of the KGC tasks, which are quite diverse and so may need different kinds of explanation. The authors correctly identify the paper by Miller (reference [50]) in their literature review, there are a couple of other papers that could be used to better scope the definition of explanation (especially in the dichotomy between something that explains what the machine does internally vs. something that is useful for the human user to understand if the machine result is relevant/correct):
- Miller, Tim, Piers Howe, and Liz Sonenberg. "Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences." arXiv preprint arXiv:1712.00547 (2017).
- Mittelstadt, Brent, Chris Russell, and Sandra Wachter. "Explaining explanations in AI." Proceedings of the conference on fairness, accountability, and transparency. ACM, 2019.

Literature review: the state-of-the-art analysis was performed very well in general, but Section 4.1 with the findings is a bit hard to read because, for sake of brevity, it is very condensed and refers to a lot of technical details that are (correctly) not fully explained. In order to improve this, the authors could either shorten the discussion even more or could add some examples here and there to clarify the different approaches.

Use cases (e.g. in Table 2): I find the naming a bit odd, because usually a use case is an application scenario related to a specific domain/context, with the description of what are the needs/goals of a set of users/stakeholders. The “use cases” in this paper are more moments/steps of a ML lifecycle, so I’d suggest the authors to find a different naming. Also, the reader is left wondering whether those “use cases” are specific to KGC or generic; in the latter case, it would be very helpful to give some examples of explanations of those ML steps in the KGC context (for example, adding a column to Table 2).

Blueprint: while it is a very relevant contribution of the paper and quite prominent in the abstract, it does not strongly emerge from the current narrative of the paper as a proposal but rather as a “surfacing” result of the analysis. I’d suggest the authors to devote a separate section to the blueprint (extracting and reshaping contents from Sections 4.3 and especially 4.4), by expanding and proposing more concretely how to apply/follow the blueprint: Figure 6 is not fully/precisely explained and a list of best practices (in the form of a checklist?) could represent a valuable additional contribution. Moreover, since human-in-the-loop is an important point of the paper motivation (and title!) that resulted to be only very limitedly covered in the specific KGC-related literature, I would welcome some further considerations/speculations about the applicability of human-in-the-loop best practices to KGC starting from the literature coming from different areas.

Paper structure: while it follows the usual paper structure (methods first, findings after), I would recommend to have methods and findings on the literature review first (i.e. Sections 3.1 and 4.1) and then methods and finding on the interview study after (i.e. Sections 3.2 and 4.2). While the two analysis are clearly related, during my reading I found myself going back and forth in the paper to understand it better.

Review #4
Anonymous submitted on 15/Dec/2024
Suggestion:
Accept
Review Comment:

The article provides an overview of methods and use cases for explainable automated knowledge engineering with human-in-the-loop. The authors performed a systematic literature review, analyzed use cases, designed and performed interviews and discuss directions for future research.
The work is well motivated, the authors follow a sound methodology and clearly defined research questions. The presentation of the results is clearly structured and easy to follow. The findings are presented following a thorough overview of relevant background on expainable AI and the knowledge graph lifecycle.
When it comes to the use cases covered, I in parts had the impresssion that the selection is skewed by what has been published/can be found in the literature rather than the use cases and methods that would really be relevant from the perspective of knowledge engineering/knowledge graph construction. For example, a significant part of the analyzed work is about link prediction – clearly a task that has recently received a lt of attention and is interesting from an automation and explainability perspective, but not really a core task in knowledge graph construction (not even according to the knowledge graph lifecycle shown in figure 1). In turn, interesting core knowledge engineering tasks are underrepresented. I acknowledge that redoing these central parts of the selection of covered work is not feasible; also the results are still meaningful, I am therefore not asking for a revision. What I believe could be revised is the taxonomy of expainable KGC in Figure 3: With the mutli-classification by KGC Tasks and XAI dimensions, every approach/reference is listed twice. I believe a presentation in a matrix (Tasks horizontally, XAI dimensions vertically, or the other way around) would be more digestable.
Overall, the work is original, the findings interesting and relevant, the discussion of research directions meaningful. I therefore recommend acceptance.