Using the relation ontology Metarel for modelling Linked Data as multi-digraphs

Paper Title: 
Using the relation ontology Metarel for modelling Linked Data as multi-digraphs
Authors: 
Ward Blondé, Erick Antezana, Vladimir Mironov, Stefan Schulz, Martin Kuiper, Bernard De Baets
Abstract: 
The Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data, however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, namely multi-digraphs and Description Logic, many precise terms for relations have emerged that are defined in the Metarel relation ontology. They are especially useful in Linked Data and RDF knowledge bases that 1) rely on SPARQL querying and 2) require semantic support for chains of relations. Metarel-described multi-digraphs were used for knowledge integration and reasoning in three RDF knowledge bases in the domain of genome biology: BioGateway, Cell Cycle Ontology and Gene Expression Knowledge Base. These knowledge bases integrate both data, like Kegg, and ontologies, like Gene Ontology, in the same RDF graphs. Their libraries with biomedically relevant SPARQL queries show the practical benefits of this semantic paradigm. In addition to the management of RDF stores, this paper describes how Metarel can be used for remodelling Linked Data as SPARQL-friendly and semantically rich multidigraphs. Metarel can be downloaded from http://www.semantic-systems-biology.org/metarel.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Guest Editors
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/special-issue-linked-data-healt...

Revised manuscript after a "reject and resubmit" and a subsequent "accepted with minor revisions". Now accepted. First round reviews are beneath the second round reviews, which are beneath the final round review.

Solicited review by Carlos Aranda:

The latest changes improved significantly the quality of this paper. Now it is more readable (at lest for me) and Linked Data concepts are much clearer. Besides, the research is understandable and it is an interesting approach for consuming biomedical RDF data.

Second round reviews:

Solicited review by Carlos Aranda:

This paper presents a research work in annotating RDF data (or OWL ontologies) with an ontology that allows to query these data in a simpler way. One example is to add to the RDF data (:john, :lives_in, :London) and extra relation like (:lives_in, :label, "some description") so the SPARQL querying can be done in a more comprehensible way for the users. For easing the access to the biological data in RDF, the Metarel ontology is used as an upper model that allows user to query the data easily. This approach is used in several three biological (RDF) knowledge bases (Biogateway, CCO and GeXKB).

In my previous review I found hard to understand the research done, but I think this has been clearly improved. They use more examples and it is more comprehensible now. However, the linked data part of the paper can be improved since some definitions are not accurate.

More in detail, in the Introduction section, authors say that Metarel is an ontology for relations, but they do not specify what kind of relations, the purpose of these relations, etc. This is a very general sentence that I would rewrite to fit better in the introduction. Regarding the next paragraph, the authors say that Metarel should fill the gap between "semantically rich data in OWL, and semantically poor triples". I thought that data is published in RDF, not in OWL and that OWL is the language for creating the model/ontology. I would rephrase to better explain that.

In the background section, saying that the "paradigm of Linked Data is the latest concept created by TBL" is rather abstract. Also, and in the rest of the paragraph the authors only say some generalities about linked data. There are many nice descriptions, and even Wikipedia provides a better explanation of what Linked Data is and the work that TBL has done. I would suggest that the authors rewrite the introduction about what Linked Data is.
In the next paragraph, SPARQL is not the predilect language for querying RDF, it is the standard language, and saying that these technologies are based on triples and IRIs is rather vague. Also, I would suggest to provide a more detailed description about the dereferencable concept, since to dereference a resource URI does not only work in web browsers [1]. Also, the concept of Linked Data they describe is not exact, there are not two types of links, there are URIs that can be dereferenced in different manners, depending on how the user is accessing the data (using an RDF browser, a Web broser, etc.). Do not forget that Linked Data is meant for machines rather than humans. In the next lines, the authors talk about blank nodes saying that they are local identifiers, which is not correct. Sometimes blank nodes are identified as existential variables, other times as identifiers. A nice reading about blank nodes is [2]. I think the authors should write in a clearer way the concepts about linked data, RDF and SPARQL since the journal special issue is exactly about Linked Data.
When talking about SPARUL, I would also talk about SPARQL 1.1 Update, since it is the successor of SPARUL and almost a W3C Proposed Recommendation.

The other sections are nicely written and understandable. The research contributions of the paper focus in using easing the access to biological RDF data and for that the authors developed Metarel and adapted it to be used in Linked Data, which appears to be a good and used improvement.

The practical implementation section is also clear and nice to read. I personally think that the interesting part of the paper is that one. Explaining how the authors use Metarel for linking the three different datasets would be very good.

Solicited review by Matt-Mouley Bouamrane:

The revised article has adressed most of the reviewers' comments.

First round reviews:

Solicited review by Carlos Aranda:

This paper (Using the relation ontology Metarel for modelling Linked Data as multi-digraphs) describes how to use the Metarel ontology for modeling linked data. The Metarel ontology was described in a previous paper. This ontology was used then for modeling class level relations and, from what I understood it is used in the same way in this paper but for modeling linked data. Using this approach, it seems easier to use SPARQL for querying for the properties of the ontology.

The main problem I had when reading this paper was that it was difficult to understand for me. Even after reading it I'm not really sure what the authors were presenting in this work until I arrived to section 3.2 in which an example is presented. Also, having a look at the original Metarel paper helped me in understanding the current paper. Thus, my main comment to this work is to rewrite in a clearer way, specially placing the example in one of the first sections and then explaining all the research problems from that point on. I think it will help very much in understanding this work.
Another comment/question is if the authors have already used this approach for querying linked data, and which datasets did they use. It is not clear in the paper if they used the ontology in any RDF dataset in the Linked Open Data cloud.

Solicited review by anonymous reviewer:

The paper describes the use of Metarel ontology in order to allow the modelling of Linked Data as MDGs. The paper is interesting and has been correctly written and explained. Just some comments that can improve a little more bit the paper.

1) In first place I've to mention that I've found very interesting the introduction section. You have explained a lot of concepts (and the relations among them) in a reasonable amount of space. However, taking into account that your paper is focused on Linked Data, I think that in this section you could speak a little bit more about LD and it's implications in bio* world. In the current version of the paper you only have a single paragraph about this. I think that a little bit more of information with references to the current and more significant efforts in the use of Linked Data in biomedical field should be presented.

2) The definition of Metarel's ontological vocabulary is interesting and very explicit. However, I think that for non-expert users can be a little bit confusing since you are talking about a lot of type of relations and you only provide examples for a couple of them. Maybe the introduction of some examples to help in the understanding of the type of relations can be useful.

3) In the abstract you mention "they are specially useful in rule-based systems for Linked Data and RDF Knowledge bases..". As computer scientist when we talk about rule-based systems (RBS, but not confuse with RDF-Based Semantics) we think in classical systems based on rules in several formats such as prolog, lisp, clips, etc.. However in the context of your paper you talk about rules or rule languages using SPARUL. I'm not even close to be an expert in SPARUL and related languages but I'm not sure if SPARUL can be considered a rule language, or, at least, not in the classical computer-science sense. For this reason maybe will be useful to insert some kind of explanation to avoid confussions.

Solicited review by Matt-Mouley Bouamrane:

Contribution:

The authors describe a framework for modelling Linked Data as multidigraphs,
directed graphs with multiple edges connecting the same source and target nodes, using Metarel, an RDF taxonomy of properties and relations between properties (meta-relations) which has been described previously elsewhere as enabling relations-inference on RDF biomedical knowledge-bases.

It appears that the article's contribution is essentially in proposing a new modelling paradigm for Linked Data which would facilitate RDF (SPARQL) querying

Some suggestions:

I think the article would benefit from some substantial re-writing to clarify the original contribution, with more of an emphasis on the benefits of the approach and practical implementations and evaluations.
The abstract would benefit from some rewording, with further emphasis on contributions and implementations.

The introduction is far too long and I would suggest the authors introduce a separate background section where much of the introduction could go.

A couple of paragraphs, briefly stating the context, emphasis on present contribution and structure of the article would suffice for the introduction.

The section 2 on Metarel development could be tidied up and clarified.

Section 3 on practical implementations would benefit from some substantial revisions an in particular further emphasis on application to biomedical KBs as this seems to have been the premise behind the original development of Metarel. Subsections 3.1 and 3.2 could be substantially shortened while Section 3.3 seems to have already been described elsewhere (Antezana et al., 2009).

Some of the proposed contribution of the framework are discussed in section 4 but I think this needs to come across more prominently throughout the article.

Other comments:

Although I am conscious this is a submission to the Sem Web journal - and as such many of the readers will be familiar with the acronyms used throughout the article - it is always good practice to define all the acronyms the first time they are used in the article (e.g. OWL, RDF, SPARQL, etc.) if not in the abstract at least in the body of the article.

Tags: