What Can Tweets and Knowledge Graphs Tell Us About Eating Disorders?

Tracking #: 3124-4338

Jose Alberto Benitez-Andrades
Maria Teresa García-Ordás
Mayra Russo
Ahmad Sakor
Luis Daniel Fernandes Rotger
Maria-Esther Vidal

Responsible editor: 
Guest Editors SW Meets Health Data Management 2022

Submission type: 
Full Paper
Social networks have become information dissemination channels, where announcements are posted frequently; they also serve as frameworks for debates in various areas (e.g., scientific, political, and social). In particular, in the health area, social networks represent a channel to communicate and disseminate novel treatments' success; they also allow ordinary people to express their concerns about a disease or disorder. As a response, the Artificial Intelligence (AI) community has developed analytical methods to uncover and predict patterns from the posts that enable to explain news about a particular topic, e.g., mental disorders expressed as eating disorders or depression. Albeit potentially rich while expressing an idea or concern, posts are presented as short texts, preventing, thus, AI model from accurately encoding these posts' contextual knowledge. We propose a hybrid approach where knowledge encoded in a community maintained knowledge graphs (e.g., Wikidata) is combined with deep learning to categorise social media posts using existing classification models. The proposed approach resorts to state-of-the-art named entity recognizers and linkers (e.g., FALCON 2.0 and EntityLinker in spaCy Python library) to extract entities in short posts and link them to concepts in knowledge graphs (e.g., Wikidata). Then, knowledge graph embeddings (e.g., RDF2Vec) are utilised to compute latent representations of the extracted entities, which result in a vector representation of the posts that encode these entities' contextual knowledge extracted from the knowledge graphs. These knowledge graph embeddings are combined with contextualized word embeddings (e.g., BERT) to generate a context-based representation of the posts that empower prediction models. We apply our proposed approach in the health domain to detect whether a publication is related to an eating disorder (e.g., anorexia or bulimia) and uncover concepts within the discourse that could help healthcare providers prevent and diagnose this type of mental disorder. We evaluate our approach on a dataset composed of 2,000 short texts related to eating disorders. Our experimental results suggest that using knowledge graph exploitation, the semantic enrichment of these messages increases the reliability of the predictive models generated concerning models that do not use the knowledge collected from Wikidata. The ambition is that the proposed method can support health domain experts in discovering patterns that may forecast a mental disorder, enhancing early detection and more precise diagnosis towards personalised medicine.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 23/May/2022
Minor Revision
Review Comment:

The manuscript presents a methodology to extract medical information on social media posts, with the support of knowledge graph representations. The manuscript is well-structured and easy to follow. The authors also provided a link to the source code of the experiments described in the manuscript, which increases its quality. This work also fits in the journal scope for adopting semantics to detect publications of interest in social media. The idea of using the medical information present in social media has been a topic of interest in the research community due to the increase of this information. Although this is not curated information, having a huge amount of data to conduct statistical studies can help medical researchers, and this work focuses on the task of identifying the publications that may support health experts.

Although I believe the paper is solid and the presented approach well described, I have the following comments:

As pointed out, tweets have a challenging small dimension when applying deep learning models in NLP classification tasks. Thus, the authors should explain why they disregard emojis.
As presented by the authors, surface forms are linked to Wikidata entities. It would be interesting if the authors could elaborate briefly on the possibility of using several KBs simultaneously. For example, Wikidata + Emojipedia or EmojiNet.
The authors may mention the lexical gap (mismatch between named entities and knowledge graph node names) and the ambiguity (the same string having different meanings) problem related to the EL process.

Review #2
By María Navarro Cáceres submitted on 16/Jun/2022
Minor Revision
Review Comment:

This paper presents a novel and hybrid approach to detect eating disorders through Twitter publications. In general the paper is well structured and easy to follow. THe contributions are quite clear and interesting for the community. The authors have designed an original hybrid approach of deep learning and knowledge graph to improve the accuracy of ED detections. In general, the experiments are well designed, with an extensive comparison among different algorithms to highlight the strength of the proposal.

Some minor questions:
- Please, review in the introduction the English-writing. There are some repetitions (for example, the authors uses the word "techniques" several times in one phrase, etc.).
- Fig1 is too far from the first time it was mentioned. Maybe you can split the information in the figure, or create a simpler figure with the essential information for the introduction.
- How the hashtags for the Database were selected? Even if they can bias the information, could the authors justify why these are important? For example, were they selected by consulting medical experts? I would suggest to add some kind of justification for this issue.
The data files provided are well organized and in particular contains a README file which makes it easy for you to assess the data. The provided resources allow the replication of experiments.

Review #3
Anonymous submitted on 05/Jul/2022
Review Comment:

This paper presents an approach for finding tweets about eating disorders. The main contribution is to add the use of knowledge graph embeddings into the data for the classification process. A 2K tweet dataset has been labeled and used for training and testing.

Strong points:
-Contributing new labeled dataset to the community
-Well-written and well-structured paper.

Weak points:
-Low novelty.
-Motivation for finding ED tweets is not clear.
-Questionable evaluation protocol.

Additional comments:
-It appears as the main contribution is the use of knowledge base embeddings for this particular problem, otherwise the approach is quite similar to what has been used for other domains.
-It's not easy to understand exactly why it's important to find tweets about ED (except those you can already find by search/filter).
-Misleading title: This paper is about classification of tweets.
-A separate validation set should have been used, not only training and test, otherwise overfitting is likely (and contributing to the high accuracy).
-No comparison with methods from related work, e.g., [8]
-As the authors also point out, the way the dataset is created (including labeling), might bias the sampling and results. I would have liked to be convinced that this is indeed not affecting the results, otherwise the performance is quite meaningless to me.
-Except for the fact that knowledge graph embeddings are used, I don't see a strong contribution wrt. the semantic web research area.