Tag as you like…, we can understand you!

Paper Title: 
Tag as you like…, we can understand you!
Samia Beldjoudi, Hassina Seridi, Catherine Faron-Zucker
Nowadays, the approaches that combine semantic web ontologies and web 2.0 technologies constitute a significant search field that attracted the interest of a best part of searchers. We will present in this paper an original approach concerning a technology that has recognized a great popularity in these recent years, we talk about folksonomies. Our aim in this contribution is to give birth to a new kind of reasoning concerning the Social Semantic Web technologies in order to see how we can overcome the problem of tags’ ambiguity in folksonomies automatically even when we choose representing these latter with ontologies. In this work we’ll also see how we can enrich any folksonomie by a set of pertinent data that can improve and facilitate the resources’ search in these systems; all this with tackling another problem from it suffer this technology, we speak about the problem of spelling variations.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Guest Editors

Solicited review by Vera Hollink:

The paper describes a method to recommend resources to users who search a collection of resources that are tagged by other users. The authors aim to tackle the problems of tag ambiguity and spelling variations.

Although the paper addresses an important problem, in my opinion it is not suitable for publication in the SW journal. The main problems of this paper are:

- It is very poorly written and needs to be checked by a native speaker. Also, the paper as a whole is not well structured. For instance, the goal of paper (handling tag ambiguity and spelling variations) is mentioned many times at inappropriate locations in the paper. In results section (4.3) sentences appear that explain part of the method (e.g. "And in order to avoid the case .... our approach give him these resources at the end ... problems").

- Although the task that is addressed is a recommendation task, the literature on (tag) recommendation is not reviewed. A good starting point would be:
Maarten Clements and Arjen P. de Vries and Marcel J.T. Reinders "The task dependent effect of tags and ratings on social media access",
ACM Transactions on Information Systems (TOIS), 2010.

- Semantic web technologies, such as RDF, RDFS/OWL are mentioned various times (especially in Section 3.4) but the role of these technologies for the task that is addresses is unclear.

- The description of the proposed method is very long, but it remains unclear what happens exactly. For example, how are the similarities between resources computed? A structured overview of each steps of the method would help.

- Evaluation is done on a small set of tags of de.icio.us users. Precision and recall scores are provided, but in order to judge the value of the proposed method they need to be compared to other methods or at least a baseline. Alternatively, the method can be evaluated on a standard data set.

Solicited review by Matthew Rowe:


This paper describes an approach to solve tag ambiguity by utilising community annotations of tagged resources. The authors present their work within the realm of domain user profiling in order to improve the retrieval of relevant content to each user's information needs.


Overall I found the paper hard to read and confusing at times. The evaluation is poorly explained and several of the claims in the paper, that the authors state that their approach addresses, are left unjustified. In particular the central contribution of the paper that the use of community tagging reduces ambiguity and thus improves retrieval of relevant information is left unproven - i.e. the authors fail to compare their approach against the accuracy of a baseline approach that does not use community tagging to reduce ambiguity. As it stands I vote to reject the paper, but should the authors follow the advice that I list below then an amended version of the paper would stand a better chance of acceptance and the make authors' arguments more compelling.

Section 1:
-The motivation of the whole paper lacks a clear example, and hence grounding. Maybe this is a problem with this field, but after reading the introduction I was unsure as to how bad the problem was that the authors were solving, or indeed why it needed solving in the first place. A concrete example from the offset would rectify this and ground the reader's understanding.
-The authors frequently refer to 'search' when I think they mean 'research'. See the opening sentence for example.
-I would argue that Web 2.0 and Semantic Web does not provide a new line of research, the Social Data on the Web workshop is now in its 4th year and Peter Mika's seminal work on folksonomies was largely published in 2005 - 6 years ago!

Section 2:
-The authors propose that reasoning is a useful means through which tag ambiguity can be reduced. However the related work section, although well presented, does not conclude with motivations as to how reasoning could help and why it is needed.

Section 3:
-When dealing with misspellings, the authors use the Levenshtein distance, however this is merely a similarity measure and does not provide corrections to words. How do the authors use it for corrections? Do they compare a given tag with every other tag in the system for the minimal edit distance?

Section 4:
-The evaluation is poorly explained and does not assess the utility of the approach against a baseline, thereby limiting the comparison of the approach against existing methods. It is not clear what is being assessed, are the authors assessing the accuracy of the resources returned to each user?
-How were users chosen who used ambiguous tags? How was ambiguity decided? Was there a method behind this? This should be explained if so.
-The second class of users were users who did not use ambiguous tags but would receive them in future, again how was this decided?
-The authors state previously, that their approach reduces search time, while this is not mentioned in the evaluation section. I was expecting a task to be described where the utility of the approach was shown in terms of reducing the query times for users? If that is what the authors are interested in assessing.
-It would have been interesting to compare the results against the null hypothesis where every resource tagged with 'apple' is returned, for example. This would yield 100% recall but a reduced level of precision. One would expect an increase in precision using the presented approach but a decrease in recall. If improving precision is the goal - this is cited in Objective 3 - then this would provide a useful comparison.
-There is no analysis of the results, the authors just present 2 graphs with no explanation of how well their method performed. Comparison against a baseline, such as the null hypothesis above, would provide enough data for discussion of the results.

Solicited review by Laura Hollink:

This paper discusses the problem of tag ambiguity for recommendation. Although the problem is very relevant to the special issue, I cannot recommend publication.

-The paper is hard to read, both in terms of grammar and structure of the text.

-The scientific contribution of the paper is not clear. The authors say that the "new which is offered in this work" is the fact that the approach is not limited to a specific folksonomy or a particular ontology. However, this is true for a lot of work on recommendation (e.g. FolkRank).

-The approach is not described in full. How is the similarity between resources calculated? What is the role of the ontology that is mentioned several times, and what does this ontology look like exactly?

-The evaluation does not give much insight into the quality of the approach as the results are not compared to other systems or a baseline. The authors give recall and precision figures, but I don't understand how the relevance of each recommended resource is determined. Section 4.3 says:

"The number of R [relevant resources] for each user is calculated according to the profile of this one in the folksonomy. For example we take the case of the user who is identified by this list of tags {java, computer, mac}, in our evaluation we have supposed that the preferences of this latter are similar to a computer sciences field and not to the food when he did a search by the keyword apple. And so all the resources that are close to the first domain are considered relevant to this user and they are proposed to him with a highest degree of recommendation."

However, this seems to be a description of the recommendation approach, not of relevance assesments.

-Section 4.2 shows graphs of the networks of users and the tags they used, user and the resources they tagged and resources and their tags. Since the paper does not discuss these graphs, it is not clear what a reader can learn from them.