DBpedia and Wordnet in Japanese

Tracking #: 418-1541

Authors: 
Hideaki Takeda

Responsible editor: 
Guest editors Multilingual Linked Open Data 2012

Submission type: 
Dataset Description
Abstract: 
Both WordNet and Wikipedia are valuable language resources covering wide domains so that an RDF version of WordNet and DBpedia play important roles in the LOD cloud. Combining them provides the basic resources for our linguistic and ontological knowledge. However, the conversion to RDF should be carried out differently for each resource because of each own lineage and characteristics. The idea of LOD should be useful to connect them.We built and published RDF of the Japanese Wordnet and DBpedia Japanese and furthermore provided the basic links between both. We expect that they will be used as the infrastructure to enrich and link other Linked Data datasets in Japan.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject and Resubmit

Solicited Reviews:
Click to Expand/Collapse
Review #1
By John McCrae submitted on 04/Feb/2013
Suggestion:
Reject
Review Comment:

This paper describes a dataset that combines Japanese WordNet and DBpedia. While this is a worthwhile project, the paper describes only the mapping between the resources and this is not well performed.

With respect to the three criterion for dataset descriptions, this paper is insufficient.
1/ Quality: The resource describes linking between the two datasets, however these links are only constructed based on the lemma (string) form of the entry. While I am aware that this is less ambiguous in Japanese than in other languages due to the use of Kanji, there are still potential mistakes. The authors do not attempt to evaluate the correctness of these simple mappings.
2/ Usefulness: As the mapping is based primarily on simple string matching, it seems that it would be very easy for another linked data consumer to duplicate this mapping. As such it is not clear whether this saves users of these resources much effort.
3/ Clarity and Completeness: In many places it is not clear why certain choices were made, for example the authors chose to use skos:closeMatch, instead of owl:sameAs. While I agree owl:sameAs would be unsuitable for linking here, it is not clear why "close match" was chosen, the only justification I can see is "it is actually an OWL individual without such a definition that specifies it to owl:Class", which I do not understand (owl:sameAs is for Individuals and I would not expect DBpedia concepts to be classes). I would recommend that the authors consider a model like lemon, or the ongoing work of the W3C OntoLex community group on linking machine readable dictionaries with ontologies.

Finally, the grammar in the paper is very poor and frequently difficult to understand.

Review #2
By Yoshihiko Hayashi submitted on 20/Feb/2013
Suggestion:
Minor Revision
Review Comment:

This paper describes (1) Japnization and RDFization of two representative resources: WordNet and DBpedia, and (2) their linking as LOD. The resulting combined resource can definitely be an important and useful one for intelligent computational processes such as multilingual natural language processing or semantic search. The current description of the resource, as described in the draft, could be further improved by considering the following points.

(1) Detailed information about the Japanese WordNet and DBpedia Japan (as specified in the "Submission Types" section of http://www.semantic-web-journal.net/authors) have to be presented in an organized way. This can improve the completeness of the descriptions.

(2) There are almost no self-evaluations about the results summarized in Table 1 and 2. This could lead a situation where the potential readers cannot judge the usefulness of the resulting resource. The authors should state whether these results are satisfying/promising or not. Besides, the paper should argue possible methodologies to improve the results, if the current figures do not achieve desired levels of coverage and accuracy.

(3) As far as the reviewer sees, one of the technically interesting decisions is the use of skos:closeMatch instead of owl:sameAs. Although the footnote#5 discusses this choice in one way, the discussion could be further detailed probably by referencing related work, such as (Garcia et al., 2012).
- Ref: J. Gracia, et al. "Challenges for the multilingual web of data.", Journal of Web Semantics, 11:63-71, 2012.

The following are less important and/or detailed issues.

[p.1, left]
- The intention and meaning of the last two sentences in the first paragraph are unclear. What does it mean by "incoherent to development in other languages"? Besides, the intention behind the introduction of EDR here is far from the reviewer's understanding.
- It should be reminded: the originators of WordNet say (Miller and Fellbaum, 07) that Princeton WordNet is not an ontology.
- Miller, G., and Fallbaum, C. (2007). WordNet than and now. LR&E, 41:209-124.

[p.1, right]
- The reviewer is not very sure that the use of the term "cross-media data" is appropriate for the context here.
- NICT Japanese WordNet: "NICT" should be mentioned in the place where the Japanese WordNet is first introduced, p.1, right-top.

[p.2, left ~ right]
- The content of section 2.2 can be included in section 2.3. It was a bit odd for the reviewer to see an independent section was given to this topic.

[p.2, right]
- The issue described by the last sentence of section 2.3 could an issue of WN-ja, not an issue of conversion to RDF.
- In section 3, the notion of "ontology building in DBpedia Japan" is quite hard to understand, making the potential readers to interpret the results in Table 1 difficult.
- The reason why the mapping rates for Japanese were around 50% of that of English should be discussed.

[p.3, left]
- The logic behind the last sentence of the first paragraph is hard to follow. What are the connections between "to link literally" and the "basic dataset"?
- The reviewer supposes that "to link literally" can achieve high precision (accuracy) at the cost of recall (coverage). The results shown in Table 1 may reveal the fact of low recall. Nevertheless, perfect accuracy would not be attained even with the "to link literally" strategy, due to potential semantic ambiguities. Therefore some evaluation for the accuracy, even it is partial, would be necessary to acclaim the usefulness of the resulting resource.
- The meaning of the last sentence of section 4.1 is unclear: maybe "over" should be deleted? Even so, the phrase, "the viewpoint of OWL, syntactically and semantically", should be properly explained.

[p.3, right]
- The text structure of section 4.3 may be flawed, especially the first two paragraphs. The materials there should be reorganized to improve the readability.
- In the first paragraph of section 4.3, the wording "needless ambiguity" or "unnecessary ambiguity" could introduce "unnecessary misunderstanding." It might be rather far clear and fair to simply say, for example, "simply selected only noun concepts for interlinking."

[p.4, left]
- "CC-BY" and "CC-BY-SA" should be properly described.

Review #3
By Dimitris Kontokostas submitted on 08/Mar/2013
Suggestion:
Major Revision
Review Comment:

This paper provides the description of two important datasets for Japanese: Wordnet and DBpedia. The authors describe how these datasets were published as Linked Open Data and how they were interlinked to each other. They both provide a small background, a description of the datasets and the license they are under. Overall, both datasets seem very promising, however, in order to accept this paper a major revision is required.

Taking the “Linked Dataset Descriptions” submission type as guide [1] and the fact that you have extra pages left to fill, you could revise your manuscript according to the following:

* Explicitly add URLs for your datasets along with an example URI to test linked data access. From my understanding, Wordnet is not served as Linked Data, but you also don't state that explicitly, either.

* Provide a separate table for each dataset with detailed statistics. Such statistics could include total triples, triples per namespace, external links or statistics on specific instances types (e.g. persons / places / .. for DBpedia or nouns / hypernyms / … for wordnet)

* Append an extra paragraph for each dataset mentioning similar versions of your datasets for other languages (beyond English) e.g. for DBpedia an Internationalization page exists [2] and there are a couple of other wordnets in other languages [3] .

* Mention possible applications for each dataset as requested by criteria "(2) Usefulness (or potential usefulness) of the dataset".

* For the DBpedia dataset the authors do not provide much information regarding the dataset description, the creation process and the ontology used. For Table 1 the authors could add a reference (as a footnote) to the DBpedia mappings statistics page [4] .

* Regarding the links to the English DBpedia the authors do not mention how they were generated. For instance [5, Section 5.1] makes a suggestion. Did the authors follow the same approach or a different one?

* Sections 4.1 & 4.2 could be partially merged with sections 2 & 3 respectively

* Writing could be further improved by a professional editor or a native speaker.

Typos:
p.1 ... so that an RDF version ... -> ... so that RDF versions ...
p.1 ... we have been realized ... -> ... we have realized ...
p.1 ... we have made the conversion ... -> ... we have the conversion ...
p.2 ... de Melo and Weikum has made ... -> ...de Melo and Weikum made ...
p.3 ... every entities... - > ... every entity ...
p.4 RDFized ... -> The RDFized ...
p.4 It is now available the dumped files -> (needs rephrasing)

References
[1] http://www.semantic-web-journal.net/reviewers
[2] http://dbpedia.org/Internationalization
[3] http://datahub.io/dataset?q=wordnet
[4] http://mappings.dbpedia.org/server/statistics/ja/
[5] D. Kontokostas, C. Bratsas, S. Auer, S. Hellmann, I. Antoniou, and G. Metakides. Internationalization of linked data: The case of the Greek dbpedia edition. Web Semantics: Science, Services and Agents on the World Wide Web, (0), 2012.