How Graph and Ontology May Add Value to Transactional Data

Tracking #: 3056-4270

Minchul Lee
Boonserm Kulvantunyou
Scott Nieman
Bongjun Ji
Nenad Ivezic
Hyunbo Cho

Responsible editor: 
Guest Editors Ontologies in XAI

Submission type: 
Full Paper
In the era of data economy, businesses seek to make data smarter - easier to be analyzed for gaining insights. They have to deal with data from multiple sources. One data architecture for integrating data from these sources is the data lake. Data lake captures all the data crisscrossing the enterprise into a single repository for easy and low cost accessing in real-time or near real-time with-out actively syncing data from the sources. In this paper, we investigate how transactional data stored in the data lake may be integrated and queried for business insights. The assumptions are that the data follows a common information exchange standard in XML syntax and the storage behind data lake is a NoSQL database. Three experiments were conducted on logistics data 1) using only NoSQL native API to get to the query of interest; 2) translating raw XML data into graph data without introducing additional formal semantics beyond what is already available in the corresponding XML schemas and use SPARQL to get to the query of interest; and 3) introducing reasoner and additional formal semantics via an OWL ontology into the architecture and use SPARQL based on the ontology to get to the query of interest. While each experiment incurs increasing pre-processing efforts; their differences and values are analyzed and discussed respectively.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Luís Ferreira Pires submitted on 21/Mar/2022
Major Revision
Review Comment:

This paper discusses three experiments performed by the authors to show how RDF/RDFS and OWL could be beneficial to facilitate the query of a NoSQL database. The experiments and the conclusions are quite interesting, but the paper needs a lot of reworking to become fit for publication. I think that at least the following points must be improved in this paper:
- Strangely enough, the title of the paper refers to ‘Transactional Data’, but transactions do not appear to play any major role in the paper, or the authors forgot to stress the transactional aspects of this work. I suggest that the authors either remove the term ‘transactional data’ from the title (and the abstract) or give more attention to transactional issues in the paper.
- I find it quite strange that the authors wrote a section to introduce XML, RDF and OWL in a paper for a specialised journal like SWJ. Furthermore, the introduction is not really accurate, by stating that ‘XML conveys meaning of data’ and ‘OWL is based on RDF’. I suggest that the authors use this section only to discuss how these techniques have been used in the paper, instead of introducing them in tutorial style. The authors can safely assume that the average reader is acquainted with these techniques.
- The initial model representing a Shipment Request and the Carrier Route are crucial for understanding the experiments and the conclusions. Yet they are presented as (incomplete) XML schemas with lots of typos. Furthermore, they apparently comply with the OAGIS standard, which may not be known to all the readers. I suggest that the authors pay much more attention to the presentation of these models, for example by showing the main concepts of the XML schema and their relations in a UML class diagram and explaining the consequences of following the OAGIS standard in the elaboration of the XML schemas. This will make it easier for the reader to follow the explanations and appreciate the benefits identified in the later sections of the paper.
- The paper formatting went wrong because some floating figures have been misplaced. Furthermore, the paper is full of grammatical mistakes and typos that have to be fixed in a possible final version of the paper.
- In the first experiment, the authors give a screenshot the Java code that should return the route, but the actual code is not shown (the screenshot has only the main method and the signature of the method). To be useful, the code of the method should be shown, but it may be even better to give the pseudocode of the method control flow, since then the details of the data structure would be become more clear to the reader.
- In experiment 2, the authors give big chunks of (incomplete) XML code to show how the transformation works. I suggest having the complete XML code moved to an appendix, and concentrating on the clarification of the transformation rules, possibly with code fragments and diagrams.
- Experiment 3 relies on the OWL ontology developed by the authors. However, this ontology is not properly discussed in the paper nor justified. I suggest that the authors spend some effort to introduce the classes and properties defined in the ontology and justify them so that the ontology can be well understood by the readers. This can also help people who want to apply the same approach to examples in other domains to confirm or refute the results of this paper.
- The main conclusion of this paper is that there are benefits of using RDF/RDFS and OWL to improve the access to data stored in NoSQL databases, but with some associated costs. Although the benefits have been illustrated with the experiments, the costs have not been properly quantified. I suggest that the authors pay more attention to the quantification of the costs associated with the use of these technologies, in terms of lines of code or modelling / development time, so that the comparison is fairer and more complete.

Concerning the evaluation criteria, my evaluation is the following:
(1) originality: there have been many publications that investigate the benefits of the semantic technologies to improve queries, thus the topic is not so original.
(2) significance of the results: this paper could have more significant results if it is improved on the lines of my comments above, but in its current form its significance is rather limited.
(3) quality of writing: the paper requires a thorough textual revision to make it fit for publication.

Although the paper refers to many data artifacts (databases, RDF/RDFS files, OWL ontology, code to perform model transformations, etc.), no reference has been made to a repository that holds these artifacts. Therefore, the paper does not comply with the “Long-term stable URL for resources” requirement.

I have more detailed remarks on the paper (textual and content-related) that are handwritten on the hard copy and can be delivered on request. In this report, I focused on more general remarks.

Review #2
By Janna Hastings submitted on 07/Apr/2022
Review Comment:

This article reports on a series of experiments about how an in-house "data lake" containing data from diverse sources relevant for transportation management can be semantically structured on a scale from semantically poor (XML) through RDF to a semantically rich OWL representation. The representations are evaluated for ease of interlinking data from different origin systems in overarching queries. The article makes a good case for the utility of the semantically enhanced representations, and as such is potentially within scope for the Semantic Web Journal. However, I cannot find any applicability to the special issue "Ontologies in Explainable AI"; as far as I can tell, the article does not discuss any AI system even in the broadest sense - even the OWL reasoning that is used is very limited.

Specific comments according to the review template are below:

(1) originality,

The originality of the article is somewhat limited, as RDF and OWL have been proposed in many different domains for structuring large-scale in-house knowledge resources. The way that the differnet levels of semantics are evaluated along a continuum is the most original contribution and may be of interest for the SWJ readership.

(2) significance of the results, and

The findings will be of interest to those who are managing legacy knowledge bases, particularly in the manufacturing and shipping domains.

(3) quality of writing.

The article is generally well written although I would stylistically have preferred less introductory material (overviews of XML, RDF, OWL etc.) and I am not sure about the very large screenshots of XML and RDF content. These are matters of stylistic preference though.

(4) data availability

There does not appear to be any associated data file with this submission.

Review #3
Anonymous submitted on 12/Apr/2022
Review Comment:

The article was submitted to the special session of ontologies for XAI. I have not found any kind of relationship between the article and XAI. I understand that an article of this type must show some technique, methodology or method to try to improve the understanding of non-explainable models or black boxes (usually machine learning ones).

The article tries to improve the interpretability of structured data with ontologies which I think is quite trivial and of course if the objective is to enrich structured data with ontologies this is something logical in my opinion. The experiments are quite simple. The article is interesting but I don't think it has the level of a journal, rather a congress or poster.

(1) originality (poor)
(2) significance of the results (poor)
(3) quality of writing (good)