Review Comment:
The article discusses the creation of an ontology of Greek mythology (OGM), which in particular supports the modeling of statements about mythological characters attested in the sources. The backbone data for the taxonomy of the classes, the statements and the properties is derived from Wikidata, but a workflow to add a rich set of additional data from other sources is also discussed. The use case for this extension is with information extracted from the website of the Theoi project, where useful information is parsed and structured from the natural-language entries in the encyclopedia with the help of NLP tools. The output of the workflow is discussed, and the paper also include some useful considerations about Wikidata as a source for ontological representation for a complex and far-reaching cultural phenomenon as Greek mythology.
As no dedicated ontologies exist for Greek mythology (to my knowledge), the work is original and welcome. The authors are well aware that one of the most important aspect of the domain that they are working with is the richness and creativity of the traditional tales, where many alternative versions about the same facts are transmitted in the literature, folklore or art. Very wisely, the authors refrain from simplifying this complexity. Their model allows to represent the fact that, for instance, certain statements that should exist only once about a certain subject (e.g. a deity having a father) can in fact have multiple instances. (Though I am not sure that the word "contradiction", used in the paper about such statements and in the name of the object property "ogm:contradictsWith", is the right word, as it relies on notions about "truth" and "contradiction" that are not very helpful when discussing Greek mythology).
I find the article interesting and stimulating. I have only 2 major observations that require revision, both of which relate to aspects that are overlooked and must be addressed.
After them, I also list a few minor points where some clarification would be helpful.
## Major observations
1) Some domains that are crucial for our understanding of Greek mythology (and which are also areas where major advancements in LOD for cultural heritage were achieved) are strangely absent from the discussion. The absence of any reference to archaeological materials is particularly glaring! Ancient art is a decisive source of knowledge for Greek mythology; alternative or original "statements" of the type that the authors mine from Wikidata or Theoi are often attested only in the iconography. Reference to artifacts representing or linked to mythological characters must be supported, in the same way as the OGM supports references to textual sources. There are several online projects with artifacts or images related to mythology, like for instance the digital LIMC (https://weblimc.org/page/home/Basel), or the iDAI.objects (Arachne) of the Deutsches Archäologisches Institut (https://arachne.dainst.org/).
The representation of information related to items in museum collections is also an area of particular importance for SW applications to Cultural Heritage. The primary reference here is to the CIDOC-CRM, which should be mentioned in the section about previous works.
Another class of data that is very relevant for mythology, and should be at least acknowledged, is that of geodata. Greek myths are often closely associated to specific places, often in relation to worship or cult. This is an area where the application of SW technologies has been particularly successful for the History and Archaeology of the Ancient World. It suffices to mention the work of the Pelagios community. As the OGM includes "Location" as one of its classes, some reference to this field is expected. I suggest to make reference to Pelagios and some related publications on geodata in the Digital Classics (e.g. H. Cayless, "Sustaining Linked Ancient World Data", https://doi.org/10.1515/9783110599572-004).
2) The authors do not explicitly discuss how they support the reference to primary (textual) sources for statements. From the discussion in Sec. 4.3.3 and Fig. 6, it seems that the reference is identified with a data property and a string representing a canonical citation, like "Hymn in Jov. 7, 10". This is not optimal, and I would strongly recommend to rely on solutions based on CTS URNs (or at the very least to mention the possibility). CTS URNs provide a standardized and rather popular way to identify and retrieve portions of texts (see e.g. the discussion in P. Cimiano et al., Linguistic Linked Data, pp. 238-241, https://doi.org/10.1007/978-3-030-30225-2_13). CTS services are implemented by several digital libraries, the Perseus Project in particular, for both the original texts in Greek and Latin and the modern translations. Moreover, as the URNs are based on widely used canonical citations (just like the one exemplified in Fig. 6 and quoted above), it should be relatively easy to generate them by parsing the content of websites like Theoi.
In any case, CTS should at least be mentioned in the paper (cite, for instance, C.W. Blackwell and N. Smith, "The CITE Architecture: a Conceptual and Practical Overview", https://doi.org/10.1515/9783110599572-006).
## Minor observations
* Section 1: though I am personally well aware of the importance of Greek mythology, it would be helpful if the authors mentioned at least a couple of foreseen applications for their ontology. Studies in comparative mythology? SW publications in literature and art? Museums? etc.
* Sec. 3.2.1, and passim: Theoi is a good use case, but how portable are the results to other encyclopedias? Among them, I would mention the digital version of Smith's "Dictionary of Greek and Roman biography and mythology" (1873), published by the Perseus Project, especially because some information (in particular the links to the textual sources in the Perseus DL) are already structured:
http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0104
* Sec. 4.3.5: some classes of statements, not found in Wikidata, were added in the "proof-of-concept" stage (e.g. ogm:CaregiverStatement). But on what grounds were these facts (like the role of a caregiver) isolated and identified? Taxonomies of folktale motives? Observation of recurring themes?
* Pag. 11, col 1, lines 3-4: "OOPS! identified some minor class overlap issues." Such as? This statement is too generic to be acceptable in this form.
|