Review Comment:
*** Overview ****
This paper investigates three different approaches of querying the data in a data lake: (1) by directly using the extracted raw XML data as is, (2) by transforming the raw data into a more abstract RDF/RDFS representation, and (3) by coupling (an adaptation of) the RDF/RDFS representation with an OWL/SWRL domain ontology.
*** Overall Impressions ***
The paper is essentially one big running-example, talking about a logistic use-case, describing three different ways of dealing with the information stored in a data lake. I appreciated the fact that the three scenarios are described in full detail, however I also miss the general picture of the work. In other words, it is extremely unclear to me what the actual contribution of the paper is.
In fact, the work does not propose a framework, does not discover new formal results, does not propose a system or an architecture, does not propose a methodology, and although the authors named the three parts of the running example "experiments", they are really just examples and not really empirical evaluations.
The authors use the running example to substantiate some claims (e.g., that the xml data is most accessible to general programmers than RDF). I feel that this "example-driven" methodology is not adequate to substantiate the claims, and that a proper empirical evaluation grounded on real-world data is really needed here.
The example itself is not devoid of problems. In fact, the way RDF, RDFS, OWL and SWRL have been applied is sometimes not so direct (see Detailed Comments below), and the authors do not really justify their translation rules. I provide an incomplete list of the issues I found in the Detailed Comments" paragraph below.
Summing up, I do not believe this submission falls in the scope of the SWJ, category "full paper", as it looks more similar to a "tutorial" on the use of SW technologies rather than a scientific contribution.
*** Originality ***
I do not see anything original. The literature is full of approaches for mapping XML to RDF, which is never referred to in the manuscript.
*** Significance of the Results ***
This work contains some considerations about a running example, that I would not consider scientific results.
*** Quality of Writing ***
At the level of language, I would say the paper is not badly written. However, there are some layout choices which I found annoying as well as several typos in the technical content:
- The use of one-page sub-figures (without a caption saying what the main Figure actually is). E.g., Figure 7 or Figure 12.
- The plain invention of new terms used as if they were standard terminology. E.g., "data graph", "OWL Application Ontology", "OWL Schema", "OWL instance", "OWL Mapping", etc.
- The inclusion of figures that do not contribute to the discussion. E.g., Figure 11 is just a screenshot of a piece of code that does a System.out.println().
- The use of a \footnote right after a table reference. E.g., Table 2^2, Table 3^3 and Table 7^4.
- A general sloppiness when presenting technical content: just taking Figure 10 as an example, left part, which should describe the XML schema of Figure 7(a), I see that some attributes do not correspond to those in the XML schema (E.g., LineNumberID or the "ID" attrbutes), others are missing (E.g., ShipToPartyReferenceeID should be ShipToPartyReference with a nested attribute ID), others are badly indented (e.g., the children ID and ShipmentRequestOrder of ShipmentRequest).
*** Supplementary Material ***
Not applicable.
*** Detailed Comments ***
I here provide a non-complete list of the issues I found.
- It is the first time I see an abbreviated name (the "Serm" in parentheses) appear in a list of authors, I wonder whether this is an accepted practice.
- Page 2, third paragraph: provide a citation for RDF schema.
- Page 2, third paragraph: the term "data graph" is not standard, nor introduced. Replace it with RDF graph.
- Capitalize all section references: section 3 -> Section 3, section 5 -> Section 5, etc.
- Page 4, first line: "object in a triple is a resource and has a Uniform Resource Identifier" -> technically incorrect, as an object might also be a literal or a blank node.
- Figure 7(a): Bad indentation for element "PartyReferenceType"
- Figure 7(b): "Carrir" -> "Carrier"
- All XML listings contain a lot of unnecessary content (e.g., all the "BaseType" types). I would simplify this, since they do not serve the example.
- Figure 12(a): this figure is neither a shipment request message nor a carrier request message. To avoid confusion, I would explicitly say that this is only a portion of a carrier request message.
- I find the translation from XML to RDF a bit unnatural, and I would have liked to see some justification for it. For instance, why is your translation always introducing URIs, even for values? E.g., consider attribute "sequenceNumber=2" in Figure 12(a). According to your translation, shown in Figure 12(c), this XML information is rendered in RDF through an "mc:hasSequenceNumber" property, which instead of being just a "data property" having value "2" (RDF literal), it is instead an "object property" connecting to a special URI called "#sequenceNumber_2", instance of a class "#sequenceNumber", and connected through the property "rdf:value" to the literal value "2". I am sure the authors had a good reason for proceeding the way they did, however I fail to see the point and the justification is not provided.
- Section 5.3: there is a massive use of "OWL xxx" terms (used as if they were standard nomenclature) which makes the first paragraph very hard to read.
- Table 3, header: remove the "or Property", because properties are handled in the lower part of table that has its own header.
- Table 3, Line 2: the "exactly 1" restriction appears to be wrong, since by reading the XML I understand that a route might have more than one shipping items.
- Table 3: you use the term "ShippingItem", and then also say that a "ShippingItem" can refer to *a number of* items. Then, I find the name extremely confusing. Also, why not to keep the same name used for the XML and RDF experiments, that was "ShipmentRequestOrderLine"? Changing the names of things across the different experiments only adds confusion.
|