Review Comment:
This paper has a number of minor flaws, but my principle reason for recommending rejection is that it does not live up to the premise that the authors establish. After a long and overly general preamble, the authors describe two efforts to annotate three different datasets with metadata in RDF and schema.org microdata. The premise is that doing so will make the datasets more discoverable and better connected, but this conjecture is never tested. It is not even discussed what "more discoverable" or "better connected" would mean in practice, nor are concrete, measurable objectives suggested. Moreover, the two methods discussed seem somewhat incomparable: schema.org can, as the authors note, be used to affect search rankings. RDF metadata, however, requires another tool - such as Sindice or something similar - to find and process the published RDF. Attempting to compare apparently incomparable approaches leaves the reader little the wiser; the more so when no conclusions are drawn.
The paper has many minor errors, too many typos, and many places where claims are made without citation. Thorough proofreading is required. Among the more concerning errors:
* "in order to find something, it must be named" (section 1). I disagree: anonymous things may be found, by their description. Perhaps it would be better to say "in order to find something, it must be identified", where identification is taken to include both naming and identifying reference expressions.
* "actionable identifiers" (section 2). The action of an identifier is to identify; therefore "actionable identifier" is a tautology. Later in this section, the authors appear to mean "resolvable" rather than "actionable".
* "Web 3.0 is essentially a way to bridge the gap between human users and computerized applications". I'm not sure quite what this means, but humans have been using computerized applications, successfully, for a long time. To the extent that Web 3.0 means anything (other than a rather vague marketing term), I don't believe that it means this.
* " Resource Description Framework ... is a standard" (section 3.1). Not being an accredited standards body, the W3C is careful to state that it makes recommendations, not that it sets standards. This should perhaps read "... is a specification"
* "RDF is built from XML triples" (section 3.1). This is most emphatically wrong. RDF and XML are completely orthoganal: one can encode RDF using XML, but XML is not fundamental to the definition of RDF.
* "RDF vocabularies are declared via namespace designations" (section 3.1). Also incorrect.
* "Prior to ORE, groups of related resources could not be made visible on the web via URLs" (section 3.2). I'm not sure what the authors are trying to convey here, but I disagree. Collections can be described in HTML as ul/li lists, or in RDF with seq and bag, or simply by publishing a list of URLs in a text file.
* "on a finite project" (section 4). Are there infinite projects?
* "RDF requires a triple store, which may be overwhelming to [..] users. It is based on XML" (section 6.1). Users do not need a triple store to publish and make use of RDF metadata, they only need a tool which can process it. Semantic web search engines, such as Sindice, can do this without the user ever creating a triple store themselves. Also, as noted above, RDF is not based on XML.
* Section 6 is correctly labelled discussion, which is all that it does. It would be more helpful to the reader if it were labelled "Evaluation", and then proceeded to evaluate the different metadata and identification approaches against measurable criteria. It is not apparent to me that an dataset creator wishing to make their dataset more discoverable could use the results of this paper as anything other than general background to a decision about how, and where, to publish metadata on the dataset.
|