Review Comment:
The paper presents the system OptiqueVQS that has been developed in the context of the EU project Optique together with stakeholders from industry. The system is based on a number of requirements that have been derived from use cases of the industry partners and translated into system features. The authors report on important considerations and design decisions as well as formalize the underlying semantics and querying expressivity. Finally, they summarize a number of user studies conducted to evaluate the usability of the system.
The paper is within the scope of the journal. It is a very informative and insightful read with a number of interesting results, including several smaller results like categorizations (e.g., types of query systems), tables (e.g., considered query types), lists (e.g., quality attributes and features), etc. My main concern is with the originality and novelty of the work, as the authors already published a couple of papers on Optique and OptiqueVQS in different conference proceedings and journals (as also indicated by the various self-references in the article). For instance, one closely related article on OptiqueVQS appeared in the journal Universal Access in the Information Society last year (see http://link.springer.com/article/10.1007/s10209-015-0404-5). While there seem to be different foci and incremental improvements, this somehow limits the originality of the work and raises the question if interested readers are able to distinguish between the different OptiqueVQS papers and select, read, and cite the right one in the end. Having raised this concern, I see sufficient novel contributions and insights in this submission that warrant another publication on OptiqueVQS and therefore recommend to accept the paper. Nevertheless, the authors should consider if there is any better way to clearly highlight the different contributions of the individual papers and guide future reviewers and readers in immediately spotting the differences and selecting the right paper, i.e., the one best meeting their information need. The current list of novelties given at the end of the introduction is a good start but not sufficient in this regard.
Furthermore, the authors should carefully read through the paper before publication. Although the paper is generally well written, some sentences need attention and could be improved. There are also some basic language flaws that should be corrected before the paper is published. Apart from inconsistent punctuation (i.e., commas, but this is a minor issue), several nouns are without articles ("a", "the") where readers would expect one, for instance, in "a query as [a] whole", "rest of [the] article", "is [a] finite set". Also, singular nouns are sometimes used where there should be plural forms and vice versa (e.g. "represented as a knowledge base[s]", "when users interacts", etc.). There are also a few typos (e.g.,"form [->from] the project’s website", "platfrom", "three-shaped", etc.), missing words (e.g., "that [is] well-suited", "used [to] expand", "usability [of] OptiqueVQS"), mistakenly inserted words (e.g. "[and] Rhizomer [16]", "for the all the") and commas (e.g., "standardised, semantics") as well as other minor language flaws (e.g., "a VQS is a data retrieval (DR) paradigm" or "A VQS have a better potential", etc.). Since at least some of the authors seem to be native English speakers, I assume that just a careful reading would be needed to fix these minor language issues.
When revising the paper, the authors should also rethink some of the quite bold statements. For instance, I would not agree that Rhizomer and Konduit VQB "demand no technical background". Although I understand what the authors want to say and that this statement is detailed in Section 8, it is too bold at that place. At least for Konduit VQB, some technical background is needed in my humble opinion (if not, I would expect some kind of proof or at least a reference to a piece of research supporting this statement). Similarly, the following statements are quite bold: "A VQL is as difficult as a formal textual query language for a domain expert as it demands considerable technical skills and knowledge to interpret the visual semantics and syntax and understand the relevant technical jargon." Without any (empirical) evidence or convincing argument or reference, these statements are too general and bold and should better be formulated more moderately (at least in this kind of research paper). I would also disagree to the following sentence: "Browsing is a good approach when the data set and result set are not very large and users need to pay attention to each individual item in the result set." There are numerous examples of faceted browsing that prove the opposite (consider e-commerce websites like Amazon or hotel search websites like Booking.com). Again, I see what the authors want to say but they should think of a better way how they could express it.
The design choices of the authors are mostly convincing and backed with good arguments. However, no reason is given for the following limitation where I would have expected one: "In this work we focus on construction of SPARQL queries where basic graph patterns do not have variables on the second position, nor on the third position, when e is rdf : type. That is, we do not allow predicates as variables, and thus our queries can naturally be represented as conjunctions of unary and binary atoms." As a minor remark, a partly repetitive sentence is used subsequently ("In our work we focus on construction of..."). This paragraph needs a revision.
I like the categorization of systems given in the introduction, i.e., the distinction of VQS, VQL, etc. Later on, another category of "visual query formulation systems" is introduced and it is stated that OptiqueVQS belongs to that category. However, it is unclear how this category relates to the initial categorization, i.e., if it is yet another category, a subcategory or just a different name for one of the above categories. Furthermore, the terms VQS and VQL are introduced a second time in Section 4, which is redundant and not necessary - especially, since this categorization is repeated yet another time in Section 8, where it is more adequate.
The following aspects should get more attention in the revised version of the article:
- Generalization: How much can the results from the presented use cases be generalized? Are the results also valid in other contexts? If yes, to what extent?
- Limitations: What are the limitations of the approach? For instance, what are the limitations that result from restricting the queries exclusively to tree-shaped graph patterns?
- Backend implementation: I would have expected more details here. What technologies are used in the backend? How has the synchronization of the different views and underlying models been realized?
This aspects deserve at least a brief description and/or discussion. In turn, other parts of the article might need to be shortened (which should not be a problem).
## Minor issues
- The term "information model" would benefit from a definition, since it might otherwise be differently interpreted.
- The word "use[-]case" is inconsistently written (with and without hyphen).
- The brackets in the example on page 3 should rather be "every wellbore has (at least) one core". It is also not the best example here, as readers might not see why a subclass construct has to be used in the OWL axiom (this is not obvious as restrictions are introduced later).
- The following is a bit too short: "(i.e., context-aware)" (p.12)
- The following number misses a unit: "The second query (Exp3) only took 63 on average..."
- The authors might consider rephrasing the following sentences:
- "The former is addressed as a part of quality attributes in Section 6, while in this section we address the local design choices concerning the implementation of individual widgets." This might be confusing due to the "quality attributes vs. quality features" distinction and reference to Section 6.
- "The tasks were all conjunctive and shown in Table 4."
|