Greek Mythology as a Knowledge Graph: From Chaos to Zeus and Beyond

Tracking #: 2754-3968

Authors: 
Juan-Antonio Pastor-Sánchez
Efstratios Kontopoulos
Tomás Saorín
Thomas Bebis
Sándor Darányi

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
Abstract: 
Greek mythology has been exerting a lasting influence on Western culture, but a respective ontology has been missing from the Semantic Web until now. To remedy this deficiency, from 5377 Wikidata items with 283 properties, 34 of these properties were selected to generate a first version of an Ontology of Greek Mythology (OGM). This limited set of properties was used to define a set of classes to instantiate the descriptions of the individuals according to reification requirements. The ontology also includes the representation of contradictions between statements, a well-known symp-tom of classical storytelling. A retrieval tool was added to use the Wikidata Query Service through SPARQL queries in order to display and download results in various formats, thereby developing OGM into a scholarly tool. Further, as Wikidata has little information about classical sources grounding the truth of its statements, we tested a semantic en-richment workflow to extract additional statement types from source texts in the ‘Theoi Project’ as statement anchors. This workflow experiment proved necessary to go beyond Wikipedia to address mythological complexities in a knowledge graph, but, as discussed in the article, its scalable automation requires further development.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Francesco Mambrini submitted on 19/May/2021
Suggestion:
Major Revision
Review Comment:

The article discusses the creation of an ontology of Greek mythology (OGM), which in particular supports the modeling of statements about mythological characters attested in the sources. The backbone data for the taxonomy of the classes, the statements and the properties is derived from Wikidata, but a workflow to add a rich set of additional data from other sources is also discussed. The use case for this extension is with information extracted from the website of the Theoi project, where useful information is parsed and structured from the natural-language entries in the encyclopedia with the help of NLP tools. The output of the workflow is discussed, and the paper also include some useful considerations about Wikidata as a source for ontological representation for a complex and far-reaching cultural phenomenon as Greek mythology.

As no dedicated ontologies exist for Greek mythology (to my knowledge), the work is original and welcome. The authors are well aware that one of the most important aspect of the domain that they are working with is the richness and creativity of the traditional tales, where many alternative versions about the same facts are transmitted in the literature, folklore or art. Very wisely, the authors refrain from simplifying this complexity. Their model allows to represent the fact that, for instance, certain statements that should exist only once about a certain subject (e.g. a deity having a father) can in fact have multiple instances. (Though I am not sure that the word "contradiction", used in the paper about such statements and in the name of the object property "ogm:contradictsWith", is the right word, as it relies on notions about "truth" and "contradiction" that are not very helpful when discussing Greek mythology).

I find the article interesting and stimulating. I have only 2 major observations that require revision, both of which relate to aspects that are overlooked and must be addressed.

After them, I also list a few minor points where some clarification would be helpful.

## Major observations

1) Some domains that are crucial for our understanding of Greek mythology (and which are also areas where major advancements in LOD for cultural heritage were achieved) are strangely absent from the discussion. The absence of any reference to archaeological materials is particularly glaring! Ancient art is a decisive source of knowledge for Greek mythology; alternative or original "statements" of the type that the authors mine from Wikidata or Theoi are often attested only in the iconography. Reference to artifacts representing or linked to mythological characters must be supported, in the same way as the OGM supports references to textual sources. There are several online projects with artifacts or images related to mythology, like for instance the digital LIMC (https://weblimc.org/page/home/Basel), or the iDAI.objects (Arachne) of the Deutsches Archäologisches Institut (https://arachne.dainst.org/).
The representation of information related to items in museum collections is also an area of particular importance for SW applications to Cultural Heritage. The primary reference here is to the CIDOC-CRM, which should be mentioned in the section about previous works.
Another class of data that is very relevant for mythology, and should be at least acknowledged, is that of geodata. Greek myths are often closely associated to specific places, often in relation to worship or cult. This is an area where the application of SW technologies has been particularly successful for the History and Archaeology of the Ancient World. It suffices to mention the work of the Pelagios community. As the OGM includes "Location" as one of its classes, some reference to this field is expected. I suggest to make reference to Pelagios and some related publications on geodata in the Digital Classics (e.g. H. Cayless, "Sustaining Linked Ancient World Data", https://doi.org/10.1515/9783110599572-004).

2) The authors do not explicitly discuss how they support the reference to primary (textual) sources for statements. From the discussion in Sec. 4.3.3 and Fig. 6, it seems that the reference is identified with a data property and a string representing a canonical citation, like "Hymn in Jov. 7, 10". This is not optimal, and I would strongly recommend to rely on solutions based on CTS URNs (or at the very least to mention the possibility). CTS URNs provide a standardized and rather popular way to identify and retrieve portions of texts (see e.g. the discussion in P. Cimiano et al., Linguistic Linked Data, pp. 238-241, https://doi.org/10.1007/978-3-030-30225-2_13). CTS services are implemented by several digital libraries, the Perseus Project in particular, for both the original texts in Greek and Latin and the modern translations. Moreover, as the URNs are based on widely used canonical citations (just like the one exemplified in Fig. 6 and quoted above), it should be relatively easy to generate them by parsing the content of websites like Theoi.
In any case, CTS should at least be mentioned in the paper (cite, for instance, C.W. Blackwell and N. Smith, "The CITE Architecture: a Conceptual and Practical Overview", https://doi.org/10.1515/9783110599572-006).

## Minor observations

* Section 1: though I am personally well aware of the importance of Greek mythology, it would be helpful if the authors mentioned at least a couple of foreseen applications for their ontology. Studies in comparative mythology? SW publications in literature and art? Museums? etc.

* Sec. 3.2.1, and passim: Theoi is a good use case, but how portable are the results to other encyclopedias? Among them, I would mention the digital version of Smith's "Dictionary of Greek and Roman biography and mythology" (1873), published by the Perseus Project, especially because some information (in particular the links to the textual sources in the Perseus DL) are already structured:
http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0104

* Sec. 4.3.5: some classes of statements, not found in Wikidata, were added in the "proof-of-concept" stage (e.g. ogm:CaregiverStatement). But on what grounds were these facts (like the role of a caregiver) isolated and identified? Taxonomies of folktale motives? Observation of recurring themes?

* Pag. 11, col 1, lines 3-4: "OOPS! identified some minor class overlap issues." Such as? This statement is too generic to be acceptable in this form.

Review #2
Anonymous submitted on 02/Jun/2021
Suggestion:
Major Revision
Review Comment:

This paper describes the creation of a Knowledge Graph representing ancient Greek Mythology, intended as resource for the comparative study of mythological heritage in relation with the sources (namely, ancient and contemporary sources and bibliographical references). As the authors point out in the introduction, the corpus of Greek mythology is intrinsically contradictory and has multiple sources, so the outcome of the research must be intended as an ongoing resource. Hence, this work emphasizes the methodology, rather than results. Despite this approach, its main limitation is actually the lack of a proper evaluation; in addition, the resource is partly available only through the demonstrator linked in the paper, rather than through standard channels. The scripts employed in the NLP pipeline are not available.
Concerning the originality of the work, the Related Work section mentions some similar efforts (Declerck et al 2017) and an existing ontology encoding the same domain (Syamili, C., & Rekha 2018), but falls short to discuss the originality of this work respect to the existing ontology and the novelty of the NLP methodologies.

The paper is well written, but lacks examples to illustrate the design of the taxonomy (and ontology), which is entirely targeted at Wikidata as a source, and omits details of the NLP pipeline, which is described in an anectodic way.
In Section 3, the discussion of the requirements should be kept separated from the limitations of Wikidata, and an overview of the pipeline described in the subsections should be provided in advance to clarify the subsequent description. In particular, when the authors mention the results obtained at each step, a discussion of the results should be given. For example, consider the sentence "the analysis of the query results showed that item Q34726 is not used in Wikidata to organize items about Greek mythology" is a bit shallow, as the query appears to return more than 2800 rows. Side note: provide the natural language description of the queries, rather than the SPARQL format, as it is hard to remember the labels of all classes and properties. In some cases, the output of a step is simply resolved by resorting to some kind of implicit alignment, as in the case of properties extracted from Wikidata, which were filter by using the existing ontology of Greek mythology. In the spirit of Linked Data, this choice should be implemented through some alignment (why not using SKOS?). By converse, I appreciated the idea of using SKOS to encode the taxonomy extracted from Wikidata, but I think it should that this choice should be discussed in the light of the lack of reasoning tools for this kind of representation. Another weak point, then, is the fact that the ontology, despite the presence of complex event patters (as the one of giving birth) is not aligned to foundational ontologies or high level patterns. Again, I think this choice debatable, but possibly defensible in the light of applications, which are not described.

The dynamic accrual of the resource through the linking of assertions to external sources is another original goal of the this work. Again, in this case, the adoption of standard patterns and vocabularies to connect the assertions on entity-to-entity relations to the external sources (or, better, the source, provided by the Theoi Project) would be useful: think for example to Prov-O, which allows on to track not only the relation with supporting texts but also the human and software responsibility for the extraction (a good opportunity for this project). Finally, the main shortcoming of this section is that it does not provide any type of qualitative (expert review? sample verification?) or quantitative evaluation of the extracted references to the quoted text excerpts, although we can consider the Onlogizer view as a partial validation.
Finally, a relevant observation by the authors concerns the frequent contradictions between sources, so I expected that some explicit "contradicts" relation would be introduced or that some comparative example would be provided.

To summarize, this paper describes a complete pipeline for designing and populating the Knowledge Graph in a domain which is very relevant (see also: Highet, G. 1949. The Classical Tradition: Greek and Roman Influences on Western Literature. Oxford: Oxford University Press.), but it is difficult to assess the soundness and coverage of resource as the description tends to be self-referential and the resource itself cannot be obtained through standard channels.