Review Comment:
SUMMARY
This paper presents a concurrent divide and conquer algorithm for classification of OWL ontologies. It comprises the following results:
Sections 34 : Parallel merge sort algorithm for sorting partially ordered sets.
Section 5: Heuristic partitioning scheme based on told subsumptions.
Section 6: Experimental evaluation of the approach showing speedups for varying number of concurrent workers using the JFact reasoner.
I like the underlying idea of experimenting with merge sort for parallelizing classification. The authors show that on some ontologies this can lead to up to 4x speedups, which is competitive with other state of the art concurrent OWL reasoners. This result can serve well as a short workshop/conference paper  indeed, a shorter version of this work has already been published. However, I don’t think this work meets the standards of a journal publication, neither in terms of contributions nor in quality of presentation.
Firstly, the contributions are overstated. A big majority of the paper (pages 419) focuses on developing this merge sort for arbitrary partially ordered sets, only to end up with a fairly straightforward recursive procedure for merging two hierarchies into one. I wonder if the authors are reinventing the wheel here. Unfortunately, there is no discussion on any related work on general poset sorting, and I am not an expert on this topic either. Quick searching online finds a number of papers, such as
“On Poset Merging” by Chen, Ding, and Seiden
“Sorting and Selection in Posets” by Daskalakis, Karp, Mossel, Riesenfeld, and Verbin
that seem to present more sophisticated merging procedures and even discuss optimal number of comparisons. I would therefore not consider this part of the paper to be a novel contribution.
Secondly, the paper is technically weak and not presented very well. Readers are often referred to study pseudocode. Intuitive explanations, when given, are imprecise and sometimes even misleading. Mathematical notation and terminology is used sloppily. Space can be used better: On one hand, long passages (Propositions 18, Figs 36) are devoted to trivial properties of transitive relations. On the other hand, identification of equivalent concepts, which seems to be an essential part of the algorithm, is dismissed in a footnote for “sake of conciseness”.
MORE DETAILS AND SUGGESTIONS
It was very hard for me to understand the relationship between your algorithms T_merge and T_merge. On page 7 you say this about T_merge “However, this method does not make use of the socalled told subsumption… For example, it is unnecessary to test… Therefore, we designed a novel algorithm T_merge”. This sounds to me as if T_merge was already a correct sorting algorithm and you were now optimizing it to somehow benefit from using told subsumptions. Only after carefully diving into the pseudocode I realized that T_merge is actually incomplete, and T_merge is not an optimization but simply a fix to make the algorithm correct. Please state this explicitly as the main motivation if you really want to go this way. But this makes me wonder why even present the incorrect algorithm in the first place? Would it be possible to just directly show T_merge as a single recursive procedure? Something like taking Algorithm 6 and instead of returning anything you could attach C under D in line 15.
Also, in the above, referring to these as “told subsumptions” does not help. I understand told subsumptions to be somehow easily derivable from looking at input axioms without full reasoning, and you actually use the term in this sense in Section 5. Here, however, you seem to mean the (complete) subsumption relationship over the subdomain that has been derived in the recursive call.
It is good that you decided to explicitly define the input and output of each procedure. But please, make this more formal and make sure that it is valid for every possible (even recursive) call of the procedure, not only for the topmost call. For example, let’s look at Algorithm 3. Firstly, the algorithm should also have “C subsumes D” as a precondition, since you return D as a parent of C without checking. Second, the procedure does not return the set of all parents of C, but only those that are descendants of D. Finally, using the notation {p  \in \leq_i} for the parents of C is incorrect because C is not in the hierarchy \less_i at all at this point.
On the same note, in Algorithms 4 and 5, what exactly is “output: the merged subsumption hierarchy \less_\alpha over \less_\beta”? Presumably this is not the complete hierarchy over the union of the two domains, because you still have to run bottommerge to make it complete. It is not even a hierarchy in the sense you described in the preliminaries: it can have multiple bottoms, as in Fig. 10, right? It would help if you could define these datastructures more precisely. Also, do you maintain your hierarchies transitively closed or transitively reduced, or maybe neither? On one hand, you use the notation ( \in \less_\alpha) to mean that A occurs in the hierarchy, which suggests that the relationship is transitively closed. On the other hand, in line 4 of Algorithm 4 and 5, you only add where a is the immediate parent of B, so the “merged” hierarchy is not transitively closed anymore?
In Section 5, can you describe (precisely) what the algorithm does using words, rather than asking the reader to digest recursive functions with four parameters? I eventually understood the algorithm as follows: Iterate over all (direct) children of top. For each child c of top, add a new partition consisting of all told descendants of c that do not yet belong to any of the earlier partitions. Is this correct?
Algorithm 9 appears completely out of the blue. In your main concurrent Algorithm 1, there are no workers, no shared queue, and no possible deadlock or racing issues. Then suddenly in this one paragraph in Section 6 you start talking about this one rather small detail without describing anything else about the implementation first. I didn’t understand the motivation here, why mention this one particular problem? Worse, does your solution really work? What if there are only two items in the queue, and two workers repeatedly take one item each, and then return it back to the queue, forever? Or the same for n items and n workers.
As mentioned earlier, I would also like to see what exactly you do with equivalent concepts. Do you have to somehow interleave your algorithm with cycle detection?
What is that expression on line 3 of page 4? Sorry, this really doesn’t make any sense. Did you want to say something like “subsumption tests over all pairs of atomic concepts occurring in the TBox”?
On first reading of Section 3, I didn’t understand what you meant by the domain \Delta, it became clearer only later in Section 3.1. Perhaps, in the context of DLs, it is unfortunate to use the term “ domain” and even the symbol \Delta, since this is unrelated to the domain of an interpretation. Would “signature” be more appropriate?
