OWLIM: A family of scalable semantic repositories

Paper Title: 
OWLIM: A family of scalable semantic repositories
Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, Ruslan Velkov
An explosion in the use of RDF, first as an annotation language and later as a data representation language, has driven the requirements for Web-scale server systems that can store and process huge quantities of data, and furthermore provide powerful data access and mining functionalities. This paper describes OWLIM, a family of semantic repositories that provide storage, inference and novel data-access features delivered in a scalable, resilient, industrial-strength platform.
Full PDF Version: 
Submission type: 
Tool/System Report
Responsible editor: 
Axel Polleres

The current manuscript is a revision, which has now been accepted for publication. The reviews below are for the originally submitted version.

Solicited Review by Andy Seaborne:

This is a systems description paper about SwiftOWL (in-memory RDF database system) and BigOWLIM (disk-backed RDF database system including clustering).

This paper has common content with the submission "FactForge: A fast track to the web of data". This duplication should be removed and the two papers woudl then read very weel toegther, one on the system, one of the application.

The paper does tend to describe OWLIM features in isolation, and not connect them to other work. Not all the features are novel or unique to OWLIM.

Section 2:

This section assumes knowledge of RDF, and briefly describes Linked Open Data. Linked Open Data is not mentioned to any significant degree in the rest of the paper. The section about Linked Open Data can be removed.

"it has become widely used"

The claim of "widely" is dubious.

Section 3:

"full-featured" - what does this mean?

"essentially open-source" is not open source. Just say "free-for-use".

"RDBMS-style-license" - needs clarification.

Section 3.1:

ter Horst ref [10] is wrong - it's [8]. I have not checked every reference in the paper but if one is wrong, then it suggests that the references are not in sync and so other reference may be wrong.

RDF Syntaxes - misses RDFa which is one of the two standardized formats.

OWL2-RL rules prp-spo2, prp-key: this part of the section is very hard to understand. At a minimum, show the rules, not simply name them..

Section 3.2: How does this relate to integrity constraints in SQL-databases?
An example of an inconsistency would be useful.

Section 3.3:
real-world concept:
URIs denote both information and non-information resources.

"probably the most important" - this claim should be removed. It is not substantiated and using "probably" makes it valueless.

The discussion of different handling of owl:sameAs is not specific to OWLIM. Using prototype chains / equivalence sets (there are many names) is a common technique in AI to reduce the problem to one where unique name assumption can be applied.

"the number of 'duplicated' result returned." - in what sense are they duplicated? needs clarification.

Section 3.4:
para 1: how does this related to the published work in Sesame of truth maintenance? Is OWLIM using the original Sesame techniques? This section reads as if the technique is new, but it also describes something that is a previous technique.

How does forward-chaining techniques like RETE fit in with persistence?

Section 3.6:
This section uses the LUBM benchmark but there is a better benchmark in UOBM. LUBM is weak in that it only needs per-university inference.

"probably" -- this claim is meaningless.

"BigOWLIM outperforms AllegroGraph and Sesame ..." but OWLIM is using Sesame for part of its operation. It would be valuable earlier in the paper to state what is OWLIM and what isn't. For example, how much of SPARQL is implemented by OWLIM and how much inherited from Sesame.

Section 3.7:
It is not clear whether TripleSets are provision of Sesame-style contexts dynamically built or a fundamentally different technique with different features. Examples or more detail is needed. How does it fit with SPARQL named graphs? How dies it fit with inference because inference is scoped across a graph.

Section 4.2:

"RDF search is a novel information retrieval concept"
This appears to be very closely related to the Talis Platform search facility.

The integration into SPARQL covers the same ground as features of Virtuoso, Jena, Mulgara and Sesame offer even if the details differ.

"molecule" - this terminology suddenly appears - explain and reference.

Section 4.3:
Section 4 is "Beyond RDF and SPARQL" - features of the system for scale and management do not fit under that title. A separate section on system features should collect together the platform issues. Include Section 3.5 (transaction management) and maybe benchmarking commentary.

How are update replicated across the cluster by the single controlling master?

Section 4.4:

Ref [10] is wrong. [10] is a reference to Jena (not used in the paper as far as I can see).

A later reference to Jena is in WWW 20004.

Section 5:
This is mostly not a conclusion but about futures. Rename the section.

"Sesame and Jena, the two most widely used RDF frameworks" - true for Java but not clear that is true for all systems - what about Redland and its use with scripting languages?

Solicited Review by Aidan Hogan:

The authors present the OWLIM family of systems, which includes SwiftOWLIM – an in-memory lightweight tool which offers query-answering over RDF data, including materialised inferences according to customizable (or standard) rule fragments – and BigOLWIM – the "big brother" of SwiftOWLIM which is better suited to applications with emphasis on scale.

As this is a system description paper, the criteria for review are quite different to that of a standard research paper. From [1], the following are the main criteria for such a review:

"(1) Quality, importance, and impact of the described tool or system. (2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool."

The review continues under these headings:

* Quality, importance, and impact of the described tool or system. *

From an engineering perspective, the quality of the tool is undoubtedly high. Although I personally do not have experience of using the tool, I know of its successful inclusion (esp. SwiftOWLIM) in various other systems. The authors have clearly invested much time and effort into creating a fully-featured suite of tools which take a common-sense approach to supporting query-answering over RDF, with much emphasis on efficiency/scalability; indeed, independent evaluations comparing query-answering and inferencing tools (referenced in the paper) have confirmed that it is in general much superior to its peer systems. In general, I believe that the OWLIM family of systems offers a good balance of efficiency and scalability without the usual loss of features prominent in other such systems.

The tool also demonstrates impressive levels of scale, discussing indexing of billions of triples on a single machine. Although comparable levels of scale have been demonstrated in the research literature, none have looked at supporting full SPARQL with inferencing, updates, deletes, etc. at this scale.

With respect to importance of the tool, OWLIM offers core support for query-answering and inferencing, with BigOWLIM providing complementary support of, e.g., replication, full text search, IR-inspired ranking, etc. As such, these primitives are at the core of many Semantic Web related applications, and industrial strength solutions in this area are of huge importance for getting real-customers to adopt RDF-centric technologies.

With respect to impact, I know the system to have been included in various applications: not only research or academic prototypes, but also in at least one real-world application for the BBC [2]. This is one thing I miss from the paper: a quick subsection on adoption of the tool, including research projects using or adapting the tool, or other use-case scenarios (including [2]). I see the cursory mention of FactForge, but you should dedicate a half-column to the impact of the tool to-date. (If you need room, you should shorten the owl:sameAs optimisation discussion, which could be summed up more succinctly, and in any case is already standard-fare for any scalable reasoner doing owl:sameAs reasoning in the literature.)

* Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool. *

In general, the paper is easy to read, descriptive and informative. Descriptions of features and optimisations are heavily example-based, with an anecdotal style which seems well suited to a system paper (although not detailed enough for a research paper &ndash in fact, I'm interested to know some more specifics which I grant *should* be out of scope for the current work). The paper is well-balanced, and gives a good overview of both the functionality and optimisations of the system.

One thing I'm concerned about is the lack of discussion about the limitations of the system: you do include some discussion – for example stating that supporting the OWL 2 RL rule "prp-key" is made optional due to the expense associated with computing inferences over it. However, I feel that in one or two other places, the paper is should tone down or make conditional some of its claims; for example:

"BigOWLIM uses a number of storage and query optimizations that allow it to sustain outstanding performance even when managing tens of billions of statements".

Demonstrating scale and efficiency over LUBM and certain over datasets is one thing, and making this generic a claim is quite another. For certain inputs, certain rules and certain queries, it is currently infeasible (nay impossible) to support the functionality you claim with the "outstanding performance" you claim at the scale you claim. For example, arbitrary SPARQL processing is known to be PSPACE-complete [3], materialisation of transitive inferences (included in pD* and OWL 2 RL) is obviously quadratic with respect to data size, etc. Your claim is too strong here. You can still make a strong claim like: "based on independent evaluation, BigOWLIM is unparalleled in terms of the intersection of its features, scalability, flexibility and efficiency", or "BigOWLIM offers near-optimal processing of...", or "for 'reasonable' inputs and requirements...". I've encountered all too many people who have been lead to believe that full and efficient SPARQL processing at large-scale has been *solved* (similarly for materialisation), and have been sorely disappointed to find out that no system can handle their 24-join queries over billions of triples, or their six-hop foaf:knows queries over Linked Data.

Similarly, I interpret that you have a known (albeit high) theoretical upper limit on the amount of data that you can handle, based on the resources (main-memory) of the best machine. Your distribution strategy is pure-replication, and thus I figure that you cannot scale by adding machines: you can only increase performance under high concurrent processing (and of course, improve fault-tolerance). If this is true, I would like to see some explicit discussion in the replication cluster subsection outlining these scale limitations. On the 64GB machine, you index 21 billion LUBM triples (inc. 9 billion materialised triples): is this the upper-bound on scale with that hardware? If not, could you estimate it?

Finally, please give a brief description of some possible use-case scenarios for the "Triplesets" functionality. You state "There are many situations..."; please provide an overview example or two. Also, the "RDF Priming" function section is a little difficult to follow: for example, how does this work given concurrent access? That is, the priming is done independently of and prior to the main query: if one user primes one "type" of node (Toyota, Japanese_Cars), and another concurrently primes another "type" of node (SEAT, Spanish_Cars), how does the system resolve the competing primed nodes? How does it relate the subsequent main query with the priming steps?

In any case, I should re-emphasise that the paper is well written, and the above comments relate to suggestions for improvement as opposed to any serious issues with presentation.

* Abstract: You state that the first use of RDF was as an annotation language: what do you mean by this? When was the change over to a data representation language as you describe it now? In Section 2, you reference Tim Berners-Lee's seminal Semantic Web paper: this paper does not mention the word "annotation".
* OWL-Horst is more formally called pD*: maybe stick in brackets alongside.
* What is owl-max? Do you have a reference? I don't know how you can merge RDFS and OWL-Lite, since RDFS has RDF-based semantics and is rule-expressible, and OWL-Lite is a "sublanguage" of OWL DL. What are OWL-Lite rules?
* Why do only prp-spo2 and prp-key require recursive rules? If you have built-in rdf:List support, no rule should require recursive rules. If you don't have built-in support, then – besides prp-spo2 and prp-key – eq-diff2, eq-diff3, prp-adp, cls-int*, cls-uni, cls-oo, cax-adc, scm-int and scm-uni also require recursive rules, like in [4]. Similarly, why does prp-key stand out as particularly expensive, over say prp-spo1 which could require arbitrary length path joins?
* Do you also have any words on rules such as eq-rep-*? Or rdfs4*? Also, I'm curious about the handling of dt-eq and dt-diff: do you support them? Similarly, how do you support prp-spo2 and prp-key using recursive rules? The solution in [4] requires the rule engine to support quadruples (or to invent blank nodes). Perhaps, however, this might be too detailed for the current scope.
* Consider some highlight (tt?) formatting for URIs.
* You state that "(we should admit though that in reality, there are not that many examples of large owl:sameAs equivalence classes)". While I admire your honesty, this is incorrect, esp. for Linked Data. In [5], we found an equivalence class of size 85,803 inferable through noise for inverse-functional property inferencing. Aside from noise, we found a valid equivalence class size of 32,390 containing blank-nodes which referred to a global-user on the Vox blogging platform. Aside from this older data and blank nodes and inverse-functional inferencing, in [6] we found (a broken) equivalence class size of 8,481 from transitive/symmteric closure of asserted owl:sameAs in a 1b statement Linked Data crawl, with a valid equivalence classes containing 443 identifiers. This is set to grow given LD publishing patterns which dictate coining redundant but dereferencable URIs, with subsequent owl:sameAs linkage. We assume large equivalence class sizes, and we've used the same optimisation that you do (e.g., in work on SAOR), as have others (e.g., WebPIE); you can safely remove the above quoted text.
* You state that for SwiftOWLIM you "[re-compute] the full-closure whenever an update is committed." Do you mean a deletion? Why would you remove the materialised statements if you're only doing additional insertions?
* The references are very messy and need to be cleaned up: make capitalisation consistent; make name formatting consistent; order by second name of first author; make URI links consistent, etc.


* Section 1: Capitalise "section 4"
* Section 2: "semantic Web" -> Semantic Web (if you're describing *the* Semantic Web).
* Section 2: "Web of data" -> Web of Data (two occurrences).
* Section 3: "1000 USD machine [using] with inference".
* Section 3.1: "in to" -> "into"
* Section 3.2: noindent after first example
* Section 3.3: "Since as gno:parentFeature" -> Since geo:parentFeature
* Section 3.5: "to complete[.] Furthermore, update"
* Section 3.5: "owl:sameAs" -> textbf{{tt owl:sameAs}} (check whole paper)
* Section 3.5: Capitalise "section 3.3" (check whole paper)


*** To clarify, these references are just for review purposes, and do not need to be added to the paper.

[1] http://www.semantic-web-journal.net/content/special-call-semantic-web-to...
[2] http://lists.w3.org/Archives/Public/semantic-web/2010Jun/0143.html
[3] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3): (2009) http://www.dcc.uchile.cl/~cgutierr/papers/sparql.pdf
[4] http://www.w3.org/TR/rif-owl-rl/
[5] Aidan Hogan, Andreas Harth, Stefan Decker: Performing Object Consolidation on the Semantic Web Data Graph. Proceedings of I3: Identity, Identifiers, Identification. Workshop at 16th International World Wide Web Conference (WWW2007), Banff, Alberta, Canada, 2007. http://sw.deri.org/2007/02/objcon/paper.pdf
[5] Aidan Hogan, Andreas Harth, Jürgen Umbrich, Sheila Kinsella, Axel Polleres, Stefan Decker: Searching and Browsing Linked Data with SWSE: the Semantic Web Search Engine. DERI TR, 2010. http://www.deri.ie/fileadmin/documents/DERI-TR-2010-07-23.pdf

Solicited Review by Giovambattista Ianni:

The paper overviews the two OWLIM semantic repositories, SwiftOWLIM and BigOWLIM. The paper content excellently fits the call for Semantic Web Tools and systems. OWLIM stores are mature and certainly deserve visibility in the community: also, the details provided are sufficient to understand how the main technical difficulties have been solved in the implementation, and which are the features of both systems.
A short benchmark report and comparison with other systems are however missing, perhaps for space reasons.

Nonetheless, my main remark regards the significant relationship between this paper and http://www.semantic-web-journal.net/content/new-submission-factforge-fas..., which has been pointed out.

Before writing my review, I read both: the former focuses on the OWLIM systems family, while the second claims to focus on FactForge. This latter can be seen as a presentation/interface layer for OWLIM systems. However, this second paper is heavily biased toward describing details of the inner OWLIM layer, more than FactForge. Benchmarks reports, the description of the inference set up process, and many other details, would fit much better in a longer paper about OWLIM systems.

I see three options for solving the issue:

1. Since the special issue calls for papers of about 8-10 pages, the authors should collapse the two papers and submit to the normal journal track.

2. Another possibility could be to suggest the authors to reword the FactForge paper, changing its introduction (which is somewhat misleading) and esplicitly describe it as a companion of the main paper.

3. On the other hand, I see the problem is not in the paper whose review I'm submitting. This one is, per se, more than sufficiently self-contained and detailed, and could be published alone.

If space allows, I would like to see more details about the RDF Priming technique, or the insertion of pointers to other contributions from the same authors.