Converting neXtProt into Linked Data and nanopublications

Tracking #: 525-1727

Christine Chichester
Oliver Karch
Pascale Gaudet
Lydie Lane
Barend Mons
Amos Bairoch

Responsible editor: 
Oscar Corcho

Submission type: 
Dataset Description
The development of Linked Data provides the opportunity for databases to supply extensive volumes of biological data, information, and knowledge in a machine interpretable format to make previously isolated data silos interoperable. To increase ease of use, often databases incorporate annotations from several different resources. Linked Data can overcome many formatting and identifier issues that prevent data interoperability, but the extensive cross incorporation of annotations between databases makes the tracking of provenance in open, decentralized systems especially important. With the diversity of published data, provenance information becomes critical to providing reliable and trustworthy services to scientists. The nanopublication system addresses many of these challenges. We have developed the neXtProt Linked Data by serializing in RDF/XML annotations specific to neXtProt and started employing the nanopublication model to give appropriate attribution to all data. Specifically, a use case demonstrates the handling of post-translational modification (PTM) data modeled as nanopublications to illustrate how the different levels of provenance and data quality thresholds can be captured in this model.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Amrapali Zaveri submitted on 11/Oct/2013
Minor Revision
Review Comment:

The authors have revised the manuscript taking all comments into account. The manuscript is now in a state to be accepted. However, there are a few minor comments that should be fixed:
- About the interlinks that the authors mentioned in their response, this information should be added as a sentence or two in the manuscript.
- About the use cases, only mentioning them does not really provide useful information. I would rather have a separate section with these use cases and one or two being illustrated with the help of a query.
- " that as new data sets..." - " that new data sets..."
- "...for others users..." - "...for other users..."

Review #2
Anonymous submitted on 23/Dec/2013
Review Comment:

The revised version of the paper has clearly improved the earlier version submitted to the special issue for Linked Data sets, and hence I am happy to consider in my review an accept, as it is following now much better the guidelines for dataset descriptions, and I also consider that the requests that were done in the initial reviews have been addressed adequately.

Particularly, the division into different levels of quality in the dataset is very good IMO. In fact, I am thinking of replicating this same scheme in some of the datasets that my organisation is helping publishing as Linked Data.

I also acknowledge that the lack of provisioning of a SPARQL endpoint is not a problem, as the dataset is provided for download, and we all understand the difficulty of running an openly available SPARQL endpoint given that it requires human resources to maintain and fine-tune them.

There are only a set of minor comments that may be addressed by the authors if considered useful by them to improve the quality of the manuscript for its camera-ready version, and which are stated here:
- The abstract is probably too long and considering the special issue where this paper will appear, some of the sentences are unnecessary. this applies, for instance, to the first three sentences, which provide general comments on Linked Data and its benefits. I would prefer it authors can focus more quickly on the development that has been done. Probably I would suggest starting with the "We have developed the neXtProt...", and only rescue some of the earlier sentences that talk about the nanopublications model.
- The same comment also applies to the second paragraph of the introduction, which is not at all needed for this special issue. I would suggest removing that paragraph as well.
- In section 2, the authors make a comment on "the term schema is understood in the Linked Data context as the mixture of distinct terms from different RDF vocabularies that are used by a data source to publish data on the Web". This is not necessarily true. All terms may come from a single vocabulary, and furthermore, the vocabularies may be encoded also in OWL. I would suggest removing this sentence.

Minor typos:
"a initial" --> "an initial"