Semantic Modeling for Engineering Data Analytics Solutions

Tracking #: 1992-3205

Madhushi Bandara
Fethi A. Rabhi

Responsible editor: 
Oscar Corcho

Submission type: 
Survey Article
Data analytics solution engineering often involves multiple tasks from data exploration to result presentation which are applied in various contexts and on different datasets. Semantic modeling based on the open world assumption supports flexible modeling of linked knowledge. The objective of this paper is to review existing techniques that leverage semantic web technologies to tackle challenges such as heterogeneity and changing requirements in data analytics solution engineering. We explore the application scope of those techniques, the different types of semantic concepts they use and the role these concepts play during the analytics solution development process. To gather evidence for the study we performed a systematic mapping study by identifying and reviewing 82 papers that incorporate semantic models in engineering data analytics solutions. One of the paper's findings is that existing models can be classified within four types of knowledge spheres: domain knowledge, analytics knowledge, services and user intentions. Another finding is to show how this knowledge is used in literature to enhance different tasks within the analytics process. We conclude our study by discussing limitations of the existing body of research, showcasing the potential of semantic modeling to enhance data analytics solutions and discussing the possibility of leveraging ontologies for effective end-to-end data analytics solution engineering.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 30/Sep/2018
Major Revision
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

I believe I was reviewing a previous version of this manuscript. If so, it is a much improved version of the manuscript. It is good that the authors started from clearly defining the scope of the paper, the key concepts and research questions. The structure is good and the findings are interesting. There are still some shortcomings, but it should not be too difficult to address them.

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.
There is a need for more explanations of what semantic modeling is, otherwise - suitable.
(2) How comprehensive and how balanced is the presentation and coverage.
The authors need to explain about their choice of application ontologies, otherwise - suitable.
(3) Readability and clarity of the presentation.
(4) Importance of the covered material to the broader Semantic Web community.
Moderate importance.

Major issues
1. p.2 “UML diagrams [4, 24, 25], petri- nets [26] and decision modeler [21] but the focus of this paper is on semantic models”
The authors should explain what are the principle differences between these modelling approaches: UML vs Petri nets vs semantic models. This also will provide readers with the necessary understanding of what semantic modelling involves.
2. I am confused about the set of considered ontologies. At the beginning, the authors explain that they are focusing on DAS engineering. But then they consider various application focused ontologies. I suggest the authors need to explain clearly if these application ontologies are only examples of such domain models. If not, then surely there are more application ontologies then considered in the paper. BioPortal only has >600 of them.
3. There is no clear separation between Application specific vs Standard ontologies. Why Gene Ontology and GALEN ontology are considered as standard ones and not application? To note, GO has a status of a reference ontology, while I believe GALEN – is not.
- ‘RDF, RDFS and OWL are the core building blocks of an ontology’. This statement is too vague and not accurate. They are languages for encoding ontologies
- Make figure captions more informative and self-contained.
- Be consistent. For example, p.1 has three variants of the same terms ‘data analytics solution’ , ‘Data Analytics Solution’, ‘DAS’
- The manuscript is well written but it still requires proofreading. For example, (there are more examples of grammatical errors), e.g. p.2
1st paragraph of the Background section:
‘a models’ -> ‘a model’ or ‘models’
The same paragraph:
“The literature emphasizes the significance of knowledge management in different fields such as enterprise data analytics [21] and scientific workflow [22] and there has been many attempts at identifying knowledge specific to DAS.“ – there should be a coma after ‘[22]’.

Review #2
By Ilaria Tiddi submitted on 02/Oct/2018
Minor Revision
Review Comment:

I have seen the author's response and new version, and acknowledge a significant improvement in the work. Most of my questions were answered satisfactorily, and I find the paper much clearer than in its previous version.

I have very minor comments relating the form (typos&minor), which should be easily addressed:
- page 1 : "There is no universally accepted definition for data analytics process" >> There is no universally accepted definition for the process of data analytics
- Data Analytics Solution (DAS) that capture >> Data Analytics Solution (DAS) that captures
- The process that represents data analytics solution is related to the discipline of data science. >> I would remove this, the previous & following sentences are connected already
- Also be consistent, DAS Engineering or DAS engineering? (I would go for the second)
- there is no one model that works best for every problem >> there is not a model working best for every problem
- page 2 : when you say "There has been many recent efforts" could you provide any evidence (e.g. increasing publications perhaps?)?
- page 3 : Abello et. al [7] study is specially about using semantic web technologies >> "In particular, Abello et. al [7] studies the use of sw technologies [...], while Ristoski ...
- page 3, section 3 : "As our objective was to provide an overview of how semantic technology is used in DAS engineering" >> As our objective is [...] in DAS engineering ,
- page 4, section 3.6 : question 1 and 2- >> question 1 and 2, i.e. ...
- page 4, section 3.6 : classification schema in top-down fashion >> classification schema in a top-down fashion,
- page 4, section 3.6 : concepts respectively . The third one - "Metadata Ontologies" >> concepts respectively. The third one, i.e. (or called) "Metadata Ontologies"
- following sentence : is very high-level and vague we >> is very high-level and vague, we
- page 5, section 3.6 : DAS related tasks >> DAS-related tasks
- and further down : enterprise oriented >> enterprise-oriented
- section 3.6 last par : from *the* 82 identified studies ... the tasks proposed in *the* 82 studies ...
- page 5, Fig 1 : " Identified Tasks from Literature" >> Identified Tasks from the literature. (with a final dot!) You might also want to specify the meaning of the arrow (and whether there is a parallel between 1-5 and 6-9 )
- section 4.2.1 "domain specific knowledge" > "domain-specific knowldge"
- 4.2.2. : [S22,S27,S35,S66] >> [S22, S27, S35, S66]
- Table 1 & 2 : the captions should be on the same line of the name of the table, did you add an extra \\ ? There is also something funny with the borders of your multicolumns, probably too many pipes | ?
- Figure 2 : Add a dot at the end of the caption! The time analysis is quite interesting, do the authors have any explanation for the spike of 2014&2015 (maybe extend your thought before 4.3.1?)?
- page 15 : OntoKDD S60 and OntoDM S79 >> OntoKDD (S60) and OntoDM (S79),
- page 6, when you mention the web of science database, I assume it is : ? (please refer)

Review #3
By Luca Costabello submitted on 06/Nov/2018
Review Comment:

I thank the authors for having addressed the points I raised in my previous review. I appreciate the effort. More specifically:

1. The proposed rewording helps clarify the angle and the rationale of the paper, although the scope remains broad.
2. The revised version of the paper limits to rewording the content of the first submission. Hints to cost benefits are not backed up by evidence. It would be therefore better to avoid mentioning cost benefits at all.
3. Thanks for adding fig2. Although it limits to the number of publications/yr by topic, it sufficiently addresses my point.
4. Thanks for the refinements.
5. Revision clarifies the ambiguity, thanks.
6. I understand the revised scope of rhe paper requires a slightly broader angle, hence he authors editing is sufficient to address this point.
7. Thanks for including works from major semantic web venues.

In conclusion, the authors sufficiently addressed my main concerns in this new submission.