From Manuscripts to Knowledge Graphs: Automating the Semantic Representation of Tafsir al-Tabari

Tracking #: 3750-4964

Authors: 
Amna Binte Kamran
Amna Basharat
Misbahur Rehman

Responsible editor: 
Guest Editors KG Construction 2024

Submission type: 
Full Paper
Abstract: 
Tafsir, the exegesis of the Quran, constitutes a fundamental pillar of Islamic knowledge, providing critical insights into the interpretation and application of Quranic teachings. Rooted in the teachings of Prophet Muhammad and early Islamic scholars, Tafsir elucidates the Quranic verses through historical, linguistic, and jurisprudential lenses. Despite its profound significance, systematic semantic modelling and digital representation of Tafsir literature remain unexplored. This paper presents the design, development, and implementation of a SemanticTafsir ontology and knowledge graph, focusing on the comprehensive exegesis by Muhammad Ibn Jarir al-Tabari. Following established ontology engineering methodologies, we define the SemanticTafsir ontology to capture and interlink key concepts within Tafsir literature. By leveraging existing ontologies such as SemanticHadith, Schema.org, and DBpedia, our approach ensures semantic coherence and interoperability. The core contribution of this work lies in the automation of the semantic representation of Tafsir literature. We develop a pipeline that automates the conversion of TEI format manuscripts of Tafsir into an RDF-based knowledge graph, integrating Quranic verses, hadith, commentaries, and thematic connections. This automation facilitates enhanced accessibility and analysis of Tafsir literature. The evaluation encompasses logical consistency, competency question resolution, and semantic fidelity, validating the ontology's robustness and applicability. The SemanticTafsir framework supports SPARQL queries, enabling both technical and non-technical users to explore and derive insights from the interconnected layers of Quranic exegesis. This research contributes to Islamic knowledge engineering by advancing digital preservation, accessibility, and scholarly engagement with Tafsir literature, thereby enriching the global understanding of Islamic knowledge. The SemanticTafsir ontology and knowledge graph are freely accessible at https://github.com/A-Kamran/SemanticTafsir.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Antonello Meloni submitted on 26/Oct/2024
Suggestion:
Accept
Review Comment:

1. Originality
The manuscript presents an original contribution by automating the transformation of TEI-encoded Tafsir manuscripts into a knowledge graph (KG), a novel approach within Islamic studies and digital humanities. This effort to create a SemanticTafsir ontology enhances accessibility and understanding of the Tafsir literature, filling a gap in digital knowledge representation for Quranic exegesis.

2. Significance of Results
The results are highly significant as they enable semantic search and exploration of interconnected concepts within Tafsir literature. The approach could advance both Islamic scholarship and digital preservation efforts, broadening accessibility and understanding of these texts.

3. Quality of Writing
The paper is clearly and meticulously written, with well-structured explanations of the KG development process and challenges. The authors outline their methodology comprehensively, ensuring readers can follow the procedural steps and reasoning behind the ontology’s design and evaluation.

Data File Assessment
A. Organization: The data file is generally well-organized but lacks full accessibility due to non-working links in the README.
B. Completeness for Replication: The broken links to the SPARQL endpoint and documentation hinder full replication.
C. Repository Choice: The repository on GitHub is appropriate for long-term accessibility and discoverability.
D. Completeness of Artifacts: Aside from the link issues, the artifacts appear complete and sufficiently described.
Suggestions for Improvement
To strengthen the reproducibility and usability of the framework, I recommend the authors ensure all links, particularly the SPARQL endpoint and documentation in the GitHub repository, are active and correctly referenced.

Review #2
Anonymous submitted on 14/Nov/2024
Suggestion:
Reject
Review Comment:

This paper presents the construction of an ontology and a knowledge graph representing Tafsir al-Tabari. A Tafsir is a body of commentary and explication, aimed at explaining the meanings of the Qur'an. Muhammad ibn Jarir al-Tabari is a Persian scholar.
This work is the continuation of a work published in the Semantic Web journal on the construction of a knowledge graph representing the hadith corpus. A hadith is an action of the prophet Mohammed, reported by a chain of narrators.
In general, the paper is well written and pedagogically presented and its content fits in well with the topics of the journal and the produced KG is interesting for cultural heritage.
However the paper presents several major shortcomings:
First, there is a lot of repetition and generalities in the article. The content could be presented in a much shorter paper.
Second, there is an overlap with the paper previously published by the authors in the SWJ on the SemanticHadith KG. There is for sure a delta but it is unclear, e.g. the new ontology comprises terms from the previous one, some competency questions are the same, the general structure of the papers and positioning are very similar.
Third, I could not access the SPARQL endpoint (nor that of the SemanticHadith). And I do not find the KG on GitHub.
Fourth, the SPARQL queries implementing the competency questions are not discussed in the paper (nor in the previous paper). There is no evaluation of the produced KG (nor of the SemanticHadith KG, which is left for future work in the published paper).

Detailed comments:

Introduction
P2 line 6-7, there shoud not be a distinction here between Linked data application and KG application, medecine should go in the list of domains line 4.

Background
This section presents some redundancies with the introduction, in particular sub-section 2.4. Merging both sections would avoid it.

SemanticTafsir Ontology
The positioning wrt the Semantic Hadith ontology is unclear.
P5 line 48 a namespace should be chosen (whatever the prefix), which is not given in the paper. The ontology should be published, with dereferenceable URIs.
P6 The overlap between competency questions that can be answered on the SemanticHadith and those specific to the Semantic Tafsir should be clearly stated.
P8 “concept classes” is not correct, nor “object type properties”, and a class does not comprise properties.
P9 Modelling decisions: similar to that of the previous paper.
access to ref 62 and 63 dates back to 2022. The references are not complete and the date is not correct.
The modelling choice is very debatable, in my view there should be 3 subclasses of Narrator, and narrator individuals should be declared as instances of them or of the superclass Narrator.
P10 Figure 3 does not reflect the n-ary patterns: there should be a refersTo relation between a ThematicVerseFragmentReference and a VerseFragment, and property hasSubTheme should be hasTheme.
Also in my view the model lacks a thesaurus for special individuals: hadith, narrators, etc.

Methodology
The text lines 27-44 is far too long, describing obvious notions for SWJ readers.
Subsection 4.1 should go in section 3
P11 lines 23-26 the text is redundant with what is already detailed in sections 1 and 2.
Section 4.2 is redundant with what is already descrubed in section 3.
P12-13 data class, concept class, object class do not make sense, nor a class equipped with properties.
Section 4.3 should be shortened. Technical staff should not be described. Obvious things should just be deleted, e.g. subsection 4.3.1 and most of 4.3.2. Subsection is not precise enough, which tools have been used? Who are the experts? 4.4 are generalities on GitHub that can be avoided.

Results and Discussion
Subsection 5.1 are technical details that are not challenges to be discussed
What is described in subsection 5.1 is not an evaluation, it is usual staff in ontology engineering that should not be described in a SWJ paper.
The statistics on the KG in Table 2 should go in Section 4. In this table, 3 millions of axioms must be a typo. The table should report the numbers of links between classes and between individuals (with a clear distinction).
A presentation of the SPARQL queries implementing the competency questions and a discussion on their results are missing.
The paper would benefit from a presentation of interesting outputs in the field of Islamic studies produced by querying the KG.
There are many assertions which are not precise enough, e.g. “significant advancement”: what are the KPI which enable to measure it? “revealing underlying patterns in the Tafsir’s interpretative framework”: which are they?
How the visualisations were created from the KG is not discussed.
The text in subsection 5.4 before and at the beginning of subsection 5.4.1 is far too general, redundant with the first sections.

Conclusion
I disagree with the assertion that a “rigorous evaluation” was presented.

In my view, the paper is better suited to a publication in a journal in Digital Humanities.

Review #3
By Bruno Almeida submitted on 21/Nov/2024
Suggestion:
Reject
Review Comment:

This manuscript describes the creation of a knowledge graph of Tafsir literature based on an ontology and on a TEI dataset of texts. The underlying topic is contextualized, and the work carried has great significance for digital humanities and Islamic studies. Unfortunately, while the OWL file of the ontology described in the manuscript is made available through the project’s GitHub repository, the knowledge graph itself is not, contrary to what is stated in the Section “Data availability”: “Ontology, Knowledge Graph, ontology documentation, SPARQL Queries corresponding to Competency Questionstions, MIRO report are available at https://github.com/A-Kamran/SemanticTafsir” (p. 20). Somewhat confusingly, the data and code made available through the GitHub repository is referred earlier in the text to as a “reference implementation”: “The resultant ontology and knowledge graph are hosted in a persistent public data repository, ensuring accessibility and facilitating future development. A reference implementation is available in a public GitHub repository, serving as a platform for issue tracking and community engagement” (p. 13). The provided link to the documentation within the Github repository (https://a-kamran.github.io/SemanticTafsir/) returns a 404 error, so there is no available documentation. There is no further information about the Python and Jupyter Notebook files available in the GitHub repository beyond a ‘readme’ file instructing to change a couple of variables in the config.py file, but the main notebook file still throws several errors about missing files/incorrect paths. In any case, the manuscript does not even mention the code or how to reproduce the results presented in Section 5.

There are also several issues regarding the provided OWL file that lead me to believe that the ontology described in the manuscript is a different version from that which is made available through the GitHub repository. For example, the ontology metrics mentioned in Section 3.6 (p. 8) and presented in Table 2 (p. 16) do not match the information presented in Protegé for the provided ontology file. For example, Section 3.6 refers to “36 concept classes, comprising 40 object type properties and 18 datatype properties” (p. 8), but the ontology file actually has 31 classes and 37 object properties. In the manuscript, there is mention of “facets” regarding the data properties: “The data properties of the ontology have different facets. These facets consist of data type, restrictions, cardinality, and other features of values for the properties” (p. 9). Yet the declared data properties have no description whatsoever beyond being subproperties of owl:topDataProperty. In Section 3.7 , the modeling decision of “Values as sets of individuals” is described: “We define a class NarratorType, which enumerates the individuals sahabi, rawi, shaykh, and unknown-rawi” (p. 9). No such individuals are declared in the ontology file. In fact, the file does not contains any description of individuals.

Furthermore, there are several bad modeling choices with the ontology. For example, the alignment properties of OWL, equivalentClass and equivalentProperty, are declared as annotation properties in the OWL file:

owl:equivalentClass rdf:type owl:AnnotationProperty
owl:equivalentProperty rdf:type owl:AnnotationProperty

The alignments to external classes and properties function in SemanticTafsir as metadata descriptions, with external URI appearing as language typed literals (ex. "http://schema.org/Place"@en). This hinders interoperability with external ontologies and datasets, which is a stated objective of the work described in the manuscript. Alignments with Wikidata entities are simply wrong, since they use URL for Wikidata pages (ex. https://www.wikidata.org/wiki/Q215627) rather than entity URI (ex. http://www.wikidata.org/entity/Q215627).

Since the results described in the manuscript cannot be reproduced, I cannot recommend the publication of the manuscript, even if the above-mentioned issues with the ontology were corrected.