Review Comment:
This paper expands a new shilling attack, named semantics-aware shilling attacks (SAShA), by leveraging semantic features extracted from publicly available knowledge graphs. While the novelty of the idea initially comes from the same authors, this paper adopts and applies the main idea to broader recommender models and attack strategies. Meanwhile, more metrics are considered to cover different aspects of the similarity between the resources.
The main noticeable strengths of this work can be listed as follows: (i) the quality of the research, which is original, well presented and contains contributions that enhance state of the art for the integration of semantics in the shilling attacks and involving a deep neural recommendation model (ii) the solid study of state of the art has been performed systematically and documented in an organised and well-structured manner (iii) the motivation of the research has been discussed in good detail which is led to precisely proposing the research idea (iv) a comprehensive experimental evaluation has been conducted in order to investigate whether SAShA is more effective than baseline attacks against Collaborative Filtering models by taking into account the impact of various semantic features. Experimental results evaluated on two real-world datasets to show the usefulness of the proposed strategy.
All these strong points have eventuated in nicely documented first four sections of this paper which needs just a few typos and language corrections. Some examples of these minor issues can be found in “Minor corrections” subsection of this review.
Despite all the above-mentioned positive points, as noted in the paper, “given the extent of experiments carried out in the experimental section, it could be hard to decipher this information at first glance”. Therefore, more precise analyses, consideration and discussion are needed.
Although the insights obtained from the experimental results are interesting, there are some inconsistency and exceptions in the analysis of these results, especially in section 5, which make these discussions and their corresponding conclusions very fragile and confusing. Therefore, this chapter needs a major amendment and revision, in my opinion. I precisely reviewed this section and highlighted its polemical parts as following.
Highlighted points for Major revise/ correction:
•“the results obtained on the Yahoo!Movies dataset (Table 5) are more indicative of attacks’ effectiveness independently of the attack strategy, the number of injected profiles, and recommender models”. (Section 5.1 - Page 13 - column 1 - Lines 45-50)
Comparing the results of the experiments in both tables, this conclusion sounds very generalised and fragile. It is not clear for the reader how authors could come out with this intuition and which parts of Table 5 differentiate the effectiveness of attack compared with Table 4.
•“Furthermore, Table 4 also confirmed the semantics aware strategy’s efficacy over the baseline, either for the average and random attacks. For instance, the semantic strategies outperformed all the and < baseline attacks independently of the recommender model and the size of attacks” (Section 5.1 - Page 13- column 2- Lines 7 -13)
While there are several exceptions in Table 4 which show some semantic strategies could not outperform the baselines, for example:
Also with all similarity measures could not outperform the baseline while and can beat the baseline. These exceptions disaffirm the above-mentioned claim.
•“However, it is worth mentioning that, differently from the results on Yahoo!Movies, on , the baseline attack’s effectiveness did not improve” (Section 5.1 - Page 13- column 2- Lines 13 -16 )
Again, some exceptions show that improved the baseline attack’s effectiveness, such as . Also, in , there are some cases such as for recommendation models within all attack granularity did not improve the baseline. These exceptions again disaffirm the claim mentioned above.
•“We can observe that the adoption of graph-based relatedness generally leads to an attack efficacy improvement over the baseline, which adopts cosine similarity metric.” (Section 5.2 - Page 13- column 2- Lines 49 -52)
Again, it is a very general claim, and it is not clear what “graph-based relatedness” means in the above sentence. Does it mean the “semantic features” or “relatedness-based measures”?
•“The general observation here is that in majority of the experimental cases, the adoption of relatedness-based semantic information leads to improvement of the attacks’ effectiveness” ( Section 5.2 - Page 14- column 1- Lines 36 -39)
What does “relatedness-based semantic information” means? Does it mean “relatedness-based measures”? There is also inconsistency in using the exact/similar phrases in different places without proper definition, mitigating the paper's self-containing.
•“We may observe the same behavior for the Yahoo!Movies dataset in Table 5, in which the HR for <1H, User-kNN, Random, Categorical, Katz> is 10% better than the baseline, i.e., 0.3725 vs. 0.3512” (Section 5.2 - Page 14- column 1- Lines 39 -43)
The baseline for is 0.3624 , therefore 10% is not correct improvement against baseline. Shall it need to be revised? Or does it mean Katz vs Cosine?
•“Beyond random attacks, we can observe some general trends also for informed attacks. In detail, Table 4 (LibraryThing), we note that categorical information improves both User-kNN and Item-kNN.” (Section 5.2 - Page 14- column 1- Lines 43-47)
Again, not in all cases. The exceptions are as , , , .
It should also mention that this trend is just valid for Average attacks and not for BandWagon attacks.
•“It is worth noticing that the same consideration does not hold for latent factor-based models. MF and NeuMF suit better cosine vector similarity.” (Section 5.2 - Page 14- column 1- Lines 47 -49)
What does “same consideration” mean here? If you mean and , the claim is not true because and outperformed all others.
•Section 5.2 - Page 14- column 2- Lines 33 -51
It is not apparent which insight of the BandWagon attack is discussed in this paragraph. Therefore, it looks pretty dangling to discuss the irrelevant measure of popularity for justification.
•“All the experimental datasets and all the recommendation models clearly show this effect.” (Section 5.2 - Page 15- column 1- Lines 33 -34)
It is not clear which effect is discussed?
•Section 5.2 - Page 15- column 1- Lines 35 -46
The hypothesis definition and the relatedness of mentioned examples as evidence for the hypothesis and the concluded result from this example are unclear and do not make sense.
•“We start focusing on Categorical knowledge. The experiments on LibraryThing show that Exclusivity is probably the relatedness that best suits this information type”. (Section 5.2 - Page 15- column 2- Lines 40 -43)
It seems the result is not that solid and precise for LibraryThing, too. There are several exceptions regarding the different recommendation models and attack type.
•Section 5.2 - Page 15- column 2- Lines 49 -51
All possible cases of and show that this conclusion is not true.
In detail, we found that with low-knowledge attacks, the best relatedness is Exclusivity for LibraryThing and Katz for Yahoo!Movies. With informed attacks, the best relatedness metric is the cosine similarity. However, for the sake of electing a similarity that better suits Factual information, we can note that Exclusivity generally leads to better results with LibraryThing.
•“Regarding Yahoo!Movies, the first and foremost consideration we can draw is that graph-based relatedness measures seem to have no positive impact when exploiting a double-hop exploration” (Section 5.2 - Page 16- column 2- Lines 32 -36)
What does “graph-based relatedness measures” mean? It seems double-hop exploration on Factual information has a positive impact on exclusivity measure for average and bandwagon attacks. There are other cases as well.
•“Indeed, in most cases, we can observe a minimal variation for the double hop performance” (Section 5.2 - Page 16- Column 2- Lines 40 -42)
How did you range the minimal variation? It is interesting to know why NeuMF on LibraryThing has significantly more considerable positive and negative variations?
•“Beyond graph-based relatedness, we observe that cosine vector similarity almost always shows an improvement when considering second-hop features (particularly with Ontological and Factual information)” (Section 5.2 - Page 16- Column 2- Lines 50 -51)
What does “almost always” mean? It is not precise insight. All possible cases of show that this conclusion is not valid.
•Section 5.2 - Page 17- Column 1- Lines 21 -38
Although these intuitions to answer RQ4 sound theoretically reasonable, there are no solid pieces of evidence from experimental results mentioned for these outcomes.
All the above points show the lack of consistency and clarity in the discussion and conclusion of the experimental result.
While the authors specified a declarative format to identify any attack combination (Section 5.1 - Page 13- Column 1 - Lines 31 -35), they rarely use this format in the rest of the paper. I believe and show in my comments that using this format would bring more clarity to the discussed items, and I highly recommend exploiting this format in the revision of the paper.
Suggestion for minor corrections:
I came to cross some minor typos and syntactic errors during this review which can be listed as follow:
•“For this purpose, we compute semantic similarities/relatedness between the items in the catalog e the target item using KG-based features (cf. Section 3.1)” (Section 3.3 - Page 9- Column 2- Lines 18 -21)
•“The baseline attack leverages the mean and variance of the ratings, which is then used to sample each filer item’s rating from a normal distribution built using these values.” (Section 3.3 - Page 9- Column 2- Lines 28 -31)
•“However, similarly to the previous two semantic attack extensions,” (Section 3.3 - Page 9- Column 2- Lines 42 -43)
•“we describe the the experimental evaluation and provide details necessary to reproduce the experiments” (Section 4 - Page 10- Column 1- Lines 3 -5)
•“Following the evaluation procedure used in Mobasher et al. [4, 88],” (Section 4.4 - Page 12- Column 2- Lines 38 -39)
•“All the results are computed for top-10 recommendation??”. (Section 5 - Page 13- Column 1- Lines 21 -22)
•“In this section, ??? devote ourselves to provide a more in-depth discussion about the impact of several factors involved in the design space of the proposed semantics-aware shilling attacks against CF models.” (Section 5.2 - Page 13- Column 2- Lines 30 -33)
Although I firmly believe this study would benefit the research community, researchers can employ the findings of this paper in their research and apply their expertise to contribute to improving this study in the suggested future research directions. However, I won’t convince myself to accept the current version before applying suggested major correction, especially in Section 5 and 6.
I very much look forward to reading the revised version of this paper soon.
Best of luck.
|