Review Comment:
Summary
The paper studies the problem of hate speech detection on Twitter. This work extends the previous conference paper in several ways. It studies the evaluation of hate speech reasoning on the importance of focusing on the hate class results. It identifies as the main challenge the unbalance between hate and non-hate classes in current publicly available datasets and the lack of unique features in the data. The study computes some ad-hoc statistics on the publicly available datasets to characterize the data. A significant contribution extending previous work is the use of skipped-CNN instead of GRU for the last part of the DNN architecture, which improves previous results.
General impression
There is an important improvement in the overall impression of the paper compared to the previous version. The authors effectively have narrowed the scope, better motivated the work, and better highlighted the specific novelty and contributions to the research area. The results obtained are now easily interpretable.
The work is an original integration work where different existing DNN architectures are combined to approach the hate speech detection problem. The experiments are done on different datasets and the evaluation metrics are well reasoned. The results obtained are superior by a small margin to the baselines reproduced and to the results in the conference paper.
Writing and presentation
The paper is in general well written. The main problems found previously have been removed and clarity has improved substantially. A final revision according to publication style guides needs to be done. A few comments are included in the per sections review.
Per sections review
1. Introduction
The authors have clarified the contributions of the paper coherently with the contents of the work.
2. Related work
The section has been considerably reduced and better focused.
The added section 2.3 includes the reasoning on the importance of focusing on the hate class results. This is, in my opinion, an important study as it shows a flaw in the evaluation method used by most of the related work providing support for better evaluation practices in the future. It also helps to put in context the improvements made by the proposed skipped CNN architecture.
3. Dataset Analysis
The detailed explanation on the creation of RM dataset has been removed since it has already been published in another work. In my opinion, this is a good choice, but I would appreciate a reference to this work and a link to the files.
The explanation of metrics used has been reduced and explanation of the figures have been improved. The whole section is clearer, including the findings explanation. Although it remains quite general and superficial, it is aligned with the rest of the paper.
4. Methodology
The introduction specifies more clear the contributions of the paper and the baselines it compares to.
In section 4.1, in the explanation of the architecture (3rd paragraph), the strides of the convolutional and pooling layers are not specified.
Reading the text seems that the 1D max pooling is applied to the joined output of different convolution sizes, so a single pooling op may contain inputs from different convolution sizes. Is it correct, or are they actually processed separately?
In the skipped-CNN section, It would be fair to include some references to other works (beyond Mikolov) that used the same idea in other contexts, eg. Atrous convolutions in image processing, or word level skip grams for sentiment analysis,...
5. Experiments
The part of word embeddings has been reduced to its most important findings. I consider this reduction is highly beneficial for the simplification of the results presentation.
The substitution of figures 6-8 for tables also makes easier the interpretation of results.
Minor comments:
- Implementation: Plural of epoch is epochs.
- 5.1 TF-IDF: Write the full name in the first occurrence.
-According to the style guide, the tables captions should be placed above the table. It is preferable to include the trailing 0 in the results: .81 → 0.81 (Or to remove the “.” indicating the units in the caption).
|