Review Comment:
I thank the authors for feedback to my review. I am mostly satisfy
except for my last point with the ordering of the triplets.
Consider the WebNLG sentence: "A.E Dimitra Efxeinoupolis is located
in Greece, the capital of which is Athens.".
There are two triplet extracted:
<"A.E_Dimitra_Efxeinoupolis", "location", "Greece">
<"Greece", "capital", "Athens">
If we reorder the triples they become
<"Greece", "capital", "Athens">
<"A.E_Dimitra_Efxeinoupolis", "location", "Greece">
If we use ROUGE *across* triplets, then we - in the first case - get as
one of the two-grams "Greece, Greece" and in the second case instead
"Athens, A.E_Dimitra_Efxeinoupolis". It was not clear to me from the
paper and it is still not clear from the reply notes whether the ROUGE
metrics are used across triplets, so the object of one triplet is used
together with the subject of the following extracted triplet. If that
is the case then the metric can be manipulated by reordering the
triplets (if there are multiple triplets).
Reading Figure 6, Appendix A, I see that the (Ciudad Ayala # country #
Mexico) is first in the output while the same triple it in the middle
of the list. Would moving that triple to the "correct" position result
in another ROUGE-2 and ROUGE-L score? As far as can read code the
authors use the 'evaluate' Python library for testing with ROUGE.
https://github.com/jrcf7/txt2graphLLMs/blob/main/scripts/utils/predictio...
But I have not examined with what format of data the 'compute' method
is called.
In graphs the ordering of the triplets are usually ignored. Therefore
if the computation of the ROUGE (-2 and -L) score is order-dependent it is highly
unusual and should be noted in the manuscript.
As a side note I would like to point out that reviewer 3 points to a
Piglou paper but the authors pont to a Kuhn paper, reference [6].
|