Review Comment:
Review of
Trace-Based Analysis of Large Rule-Based Computations
paper by Terrance Swift
This paper describes a logging tool for XSB Prolog (forest logging)
that can help users in profiling a computation and in identifying the
parts of code that are at the origin of bottlenecks in the
computation. The relevance for the Semantic Web Journal is that tools
intended for the semantic web, such as Flora, Silk, and Fidji have XSB
Prolog underneath and hence that these tools can help users of those
systems in analysing applications that consume unreasonable amounts of
computational resources. The paper includes some convincing examples of
the usefulness of the tool.
That being said, this is a technical paper about the XSB system and
SWJ is not the first place I would look for a paper about profiling
XSB programs. However, I leave the judgement about the scope for the
editors.
Overal, the paper is a valuable technical contribution that addresses
a problem that is relevant for XSB users and deserves to be published
somewhere. However, there are some glitches here and there and the
paper needs a careful revision. Below I give a list of detailed
comments.
One more high level remark: In section 3 a morphism H is defined on a
forest, however, it looks as the concept is not used to reconstruct a
forest from a log file (theorem 3.2). This needs clarification. More
general it is unclear whether every analysis starts with
reconstructing the forest from the log, or whether many questions can
be answered by a more direct interpretation of the info in the
log. Moreover, it looks as the proof (and the code used in it) of
theorem 3.2 is incomplete (or I missed something). This need to be
clarified in a second reviewing round.
DETAILS
Perhaps add "forest logging" to the title, e.g., "Forest Logging: A
Trace-Based Analysis of Large Rule-Based Computations".
p1 right bottom:
effort for the SIlk project.
-->
effort for the Silk project.
In ex 1.1: What stands "AP" for in AP biology questions ?
p 2 right top:
and A analyze logs-->
and analyze logs
footnote 3:
While it is fair to refer to other sources for more details about SLG,
I would recommend to recall in one sentence the idea of local
scheduling.
p3 right
(e.g. children -->
(e.g., children (twice)
Ex 2.1, Fig 1:
After step 16: why 17 is next and not 21?
There is a mutual dependency between both trees: how is it decided in
which tree an extra branch is started? Is it determined by the
scheduling or is it arbitrary in this case?
p4 left def 2.1
is there is a path from -->
if there is a path from
p4 right
unique maximal independent SCC
By "maximal" you mean maximal in the partial order over SCC's?
in ex 2.2
scheduling; and as --->
scheduling; as
Fig 2 (p6)
In the body of the second p clause: nor --> not
The code of the tree seems to be inconsistent with the forest:
In the third clause, to be consistent with the forest, the body should be?
:- t(X,Y,Z), not p(Y), not p(Z). (there is also a superfluous "(")
and the facts for t/3 should be (wrong order and incomplete)?
t(a,a,b).
t(a,b,a).
t(a,c,b).
This makes example 2.2 hard to follow
In the tree for p(b):
If you ignore the early completion and create node 9, you should
continue with it and extend it with a fail node? Only then, p(b) is
completed?
The failure of nodes 6 (node 11) should be before the creation of node
10?
End of example 2.2 (p5) "Since p(a) is completely evaluated with no
answers, conditional or otherwise, the evaluation determines it to be
failed"
This is the "ANSWER COMPLETION" operation of p7?
p5 right
top
"If S_called is the first tabled subgoal called in an evaluation, then
S_called is set to null"
It is S_caller which is set to null.
ANSWER RESOLUTION
you refer to the answer S_called\theta by A but A was not introduced.
NEGATIVE REGURN -->
NEGATIVE RETURN
bottom
(i.e. N is -->
(i.e., N is
p6
COMPLETION
By SCC_index, you mean a unique identifier for the SCC to which S
belongs?
p 7 left
ANSWER COMPLETION
An example of this is the failure of p(a) at the end of ex 2.2.
As said above, use the term ANSWER COMPLETION in the example.
Fig 3 (the log file)
As far as I understand, some things go wrong beyond node 15:
next should be
- answer resolution on node 11, leading to the creation of node 16
- na([1],reach(3,_v0),11) is associated with node 18, not node 17
- answer resolution on node 11, leading to the creation of node 19
- na([1],reach(1,_v0)... associated with node 19
....
p7 right
e.g. computations -->
e.g., computations
p8 def 3.2
"closest parent-child ancestor"
the concept has not been defined.
Why not saying "the closest ancestor of n whose selected literal is
tabled" And give at least one example e.g. in fig 1: H(2) = 1 or
perhaps show the whole condensed forest H(F)
The paragraph "In order to reconstruct an SLG ..."
"the parent of each logged fact f needs to be determined"
you mean "each entry in the log file"?
The whole paragraph is not very clear, I wonder whether you can use Figure 1
and its log file in Figure 3 to point out more precisely what is the
issue.
Also, the theorem claims you can reconstruct the trees of the original
forest and does not mention H(F). Neither does the proof. So, what is
the point of introducing H(F) if there is not a two phase process that
first uses the log to reconstruct H(F) en then H(F) to reconstruct(F)?
Definition 3.3
It is not very intuitive and hence not very elegant to say that two
empty bodies are distinguishable.
you would better say e.g. "and the sequences L2, ..., Ln and L′2, ..., L′n
are either distinguishable or empty" ??
case 1 and 2 in the definition: this is "either 1 or 2" (?)
Perhaps you can illustrate with an example what can go wrong when Li
unifies with a literal in Body'?
(as to motivate the requirement "distinguishable")
Theorem 3.2: You are using S without introducing it (Guess it refers
to the root node)
So, the theorem says that, (if enough is tabled) the forest can be
reconstructed.
But in the next sections, it is unclear for which parts of analysis it
is needed to reconstruct the full forest, or whether many answers can
be given by a more direct analysis of the log.
p 9 right
Proposition 3.2 -->
Theorem 3.2
p10
fig 4
582754 were calls to new subgoals
4460609 were calls to incomplete subgoals
3594936 were calls to complete subgoals
right aligning the numbers ie.
582754 were calls to new subgoals
4460609 were calls to incomplete subgoals
3594936 were calls to complete subgoals
would enhance readability
Discussion of analyze_an_scc(39):
one can read between the lines that edges correspond to calls to
predicates with incomplete tables, hence does not refer to the forest,
but it is not said to what it refers: to the subgoal dependency graph?
to H(F)?
p11 left
conists -->
consists
p12 left
it can be useful understand -->
it can be useful to understand
right
written into a internal buffers -->
written into internal buffers
p13 right
for the both the-->
for both the
p14
itself, when logging was turned on the left-recursive-->
itself, when logging was turned on, the left-recursive
footnote
but its queries not as instantiated. -->
but its queries are not as instantiated.
Was not Mireille Ducasse one of the pioneers of trace based analysis
for Prolog?
starting with: Mireille Ducassé: Opium+, a Meta-Debugger for
Prolog. ECAI 1988: 272-277
She deserves a few more refs?
p15
"under certain conditions has the
information available to construct a homomorphic im-
age of an SLG forest. "
I do not understand: none of your theorems mentions "homomorphic
image"
ref [12]
Justificatinos -->
Justifications
p18 right
you are mentioning "Body'" (four times)
You should introduce Body' as the part of the body following L
Body' is also used on p 20.
The code on p 19 has similar problems:
Body_{Right} is introduced, but further on Body' is used
line 33: N should be Node
line 36 and line 38:
you are adding an element to a set, so you should write
S :- S \cup \{element\}
line 42: "Delay - Lit"
You mean "Delays \setminus \{S_called\}" ?
l 45 should be S := S \setminus \{f\}
p20
The next case (lines 24-32 of Figure 8) L is
-->
In the next case (lines 24-32 of Figure 8), L is
I see two cases in the code on p20:
- there is a leftmost tabled literal in the body
- the body is empty
But the case where the body has only non-tabled literals is
missing (and at some point, only those literals are left in the
created "Child", they can become further instantiated but are never
removed?)
If you really reconstruct the original forest as the theorem claims,
you should handle the non-tabled calls and use resolution with the
program clauses to solve them? If the aim is to reconstruct H(F),
things might be different but need also clarification.
|