KGG4SE: A Knowledge Graph Generation Framework for Systems Engineering

Tracking #: 3844-5058

Authors: 
Frank Wawrzik
Khushnood Adil Rafique
Theogene Urimubenshi
Christoph Grimm

Responsible editor: 
Guest Editors 2025 LLM GenAI KGs

Submission type: 
Full Paper
Abstract: 
In this paper we introduce the Knowledge Graph Generation Framework for Systems Engineering (KGG4SE). Based on the GENIAL! Basic Ontology (GBO), a variety of large language models and prompt engineering, we generate numerous knowledge graphs from a diverse set of input sources. One of the key features of the framework is the reasoning-in-the-loop integration. The generated classes are structurally consistency checked and inconsistent classes are removed. Further features include the generation of research articles, technology videos and datasheets. Also quality control prompts are used and the framework is integrated into a system engineering tool (SysMD) with frontend and backend. This makes content of the knowledge graph accessible to users in MBSE (Model-Based Systems Engineering) ecosystems. Finally we outline results of the generation process and content of the graphs as well as the reasoning process with disjoint axioms. The results show an improved graph quality and structure in comparison to existing approaches. In terms of succinctness and conciseness, we remove an overall of 67.4% of classes that do not adhere to the ontological entailment or domain. Although graph quality is sometimes difficult to qualify and quantify and the implementation needs further work, we believe the methodology of this framework is a good way forward to produce better quality graphs, which are scalable.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Enrique Iglesias submitted on 30/Sep/2025
Suggestion:
Major Revision
Review Comment:

General Comments
Summary
This paper presents a framework for generating knowledge graphs (KG) in the field of systems engineering, known as the Knowledge Graph Generation Framework for Systems Engineering (KGG4SE). The proposed framework utilizes Large Language Models (LLM) for the extraction of data from multiple sources, like data sheets, PDF files, and technical videos, and the evaluation of the quality of the KG.

This paper is well-written and provides extensive detail in describing the structure and functionality of the proposed framework. Unfortunately, this paper lacks significant content that would otherwise enrich the reader's understanding. The paper does not clearly state the problem it is addressing. It remains ambiguous whether the focus is on KG construction in systems engineering, schema unification across heterogeneous sources, or domain-agnostic KG construction. While related literature is listed, the framework is not critically compared with existing approaches, making it difficult to assess its novelty or relevance. No ablation study is conducted, and the choice of specific LLMs for different tasks is not justified. Moreover, the results section lacks clear evaluation metrics and comparison with state-of-the-art methods.

Section Comments
Section 1: Introduction
Positive: This section aims to introduce the proposed framework and outline the paper's overall structure. It also describes to a certain extent the motivation of the authors for developing the proposed framework,
Negative:
-This section lacks a clear definition of the problem being addressed. It is unclear whether the authors aim to solve an issue explicitly related to KG construction in systems engineering, the unification of heterogeneous data sources into a common schema, or a more general form of KG construction that is independent of the domain. Additionally, a formal definition of the problem should be provided at a later point in the paper. This will aid in understanding the main objective behind this work.
-The authors do not highlight the contributions of this paper.
-This section would benefit from a motivating example or a stronger discussion of the motivation behind this work.
-The acronyms KG and LLM are introduced in this section but used very inconsistently throughout this section and the paper as a whole.
-“Widespread application” should be “widespread applications”

Section 2: State-of-the-art
Positive: This section introduces a long list of works that utilize LLMs and KGs for a specific purpose or employ LLMs for the generation of KGs.
Negative:
-The authors do not highlight the shortcomings of the presented papers. Additionally, the proposed framework is not positioned or compared against any of the works presented in this section.
-The reference to “15 other papers” in subsection 2.2 is unhelpful without details. Specify the papers in question and remove the statement.
-The section could be strengthened by including KG construction approaches that do not utilize LLMs (for example, SDM-RDFizer, Morph-KGC, RMLMapper, Flex-RML, OnTop) to demonstrate the added value of LLMs more effectively.
-Consider renaming the section “Related Work” and subsection 2.1 to “Works that Utilize KG and LLMs.”

Section 3: The Knowledge Graph Generation for Systems Engineering (KGG4SE) Framework
Positive: This section goes into extensive detail on the inner workings of the proposed framework. Including how the different data formats are processed and the functionalities of the frontend.
Negative:
-The authors should clarify what type of KG is being generated. For example, is the KG an RDF KG or a Property Graph? Given that the authors use Neo4j, it is likely a property graph; however, some clarification would be beneficial.
-The authors mention that they utilize different LLMs for specific tasks, such as using GPT-4 for video processing and Gemini for data sheet processing. The authors do not explain why the specific LLM is used for that particular task.
-Ambiguous reference to “In the Figure” on page 4 should specify which figure.
-In subsection 3.5, “PDF’s” should be “PDF files”.
-In subsection 3.5, “appraoch” should be “approach”.
-When describing the GENIAL! Basic Ontology, are any other vocabularies used, such as RDF or RDFS?

Section 4: Results
Positive: This section presents the results of different approaches in the proposed framework.
Negative:
-There is no clear definition of the metrics used to evaluate the performance of the framework.
-An ablation study should be conducted to justify the selection of different LLMs for their specific tasks.
-This section lacks a comparison study between the proposed solution and other existing state-of-the-art approaches. This would help position the proposed framework in relation to the existing state-of-the-art.

Section 5: Conclusions
Positive: This section presents the conclusions reached and the future work.
Negative:
-Due to the absence of ablation and comparison studies, the conclusions are limited to stating that the framework can generate KGs, without demonstrating its advantages or contributions relative to prior work.

Review #2
Anonymous submitted on 03/Oct/2025
Suggestion:
Major Revision
Review Comment:

Overall
I see this paper as the authors' first attempt, and it is indeed a valuable experience. However, the paper has serious structural issues and lacks important content, though it does show a valuable research goal (albeit one that requires substantial refinement). I would label this submission as a major revision (considering it is a first attempt), but the authors will need to make extensive revisions.

Abstract
The abstract currently emphasizes how the framework is implemented, but it does not clearly state the motivation (why existing KG generation methods for systems engineering are insufficient). It also does not briefly mention the evaluation methods used, which I believe are equally important.

Introduction
The introduction provides very limited references to support the motivation and research narrative (currently only two, with one being Wikidata). I think this section could be strengthened with more references and a clearer structure. The contributions could be written explicitly as bullet points, followed by a short preview of the evaluation process. Most importantly, the research questions are missing, and they should be written explicitly in the introduction, also in bullet points for clarity.

Related Work
It is somewhat difficult to understand why works on KGs improving LLMs are included. The reviewed works are also very general. Domain-specific works are lacking; if systems engineering studies are limited, related domains such as electronics, automotive, or industrial design could be included to give readers a broader picture.

Section 3.1
It is unclear how the three needs listed in this section were identified, whether they are based on a formative study or a systematical literature review. If they are based on literature, references should be provided.

Section 3.6 (Frontend)
The frontend description lists components such as the tree view, editing area, and top bar, but it does not explain how users interact with them. In my understanding, a frontend description should at least include both the interface and the interaction techniques. These should ideally be introduced in the context of a system workflow or user scenarios.

Section 3.7 (Backend)
The reasons for the chosen techniques in this section are not clearly explained.

Section 3 (Framework Description)
The structure of Section 3 is hard to follow. It would be helpful if the authors could provide an introductory paragraph under this section that explains the overall logic and flow, with references to the subsection numbers.

Evaluation Section
Between Section 3 and Section 4, there seems to be a missing section that explicitly describes the evaluation process.

Section 4 (Results and Discussion)
The structure of Section 4 is also difficult to follow. For example, the discussion (4.1.1) is included under results (4.1). It would be clearer if all discussions were presented together in a dedicated discussion section after the results section.

Section 4.4 (Reasoning Evaluation)
The evaluation goals or evaluation questions are not clearly stated. Without them, it is difficult to properly assess this section.

Conclusion
The conclusion does not describe the limitations of the methodology used.