Producing Linked Data for Smart Cities: the case of Catania

Tracking #: 930-2141

Sergio Consoli
Misael Mongiovì
Diego Reforgiato Recupero
Silvio Peroni
Aldo Gangemi
Andrea Giovanni Nuzzolese
Valentina Presutti

Responsible editor: 
Guest Editors Smart Cities 2014

Submission type: 
Full Paper
Semantic Web technologies and in particular Linked Open Data provide a means for sharing knowledge about cities as physical, social, and technical systems, so enabling the development of smart city applications. This paper presents the case of Catania with the aim of sharing the lessons learnt, which can be reused as reference practices in other cases with similar requirements. The importance of achieving syntactic as well as semantic interoperability - as a result of transforming heterogeneous sources into Linked Data - is discussed: semantic interoperability must be solved at data level in order to ease the development of smart city applications. This claim is supported by showing how this issue impacts on the design of two smart city applications. As main contributions, the paper describes: (i) methods, procedures, and tools used for transforming heterogeneous sources into Linked Data; (ii) an ontology design pattern for modelling urban public transportation routes; (iii) methods, procedures and tools for ensuring semantic interoperability during the transformation process; (iv) the design of two smart city applications based on Linked Data. All produced data, models, and prototypes are publicly accessible online.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Alessandra Mileo submitted on 21/Dec/2014
Review Comment:

This paper illustrates how semantic technologies have been used to enable integration of smart city data as Linked Data in the case of the City of Catania.
In the same context, authors develop two different applications based on the semantically annotated and linked data sources.

General Comment:
I believe the novelty of techniques illustrated in the paper is not good enough for a full paper, and there is very little references to the state-of-the-art is only provided as a list of available related solutions, without clearly identifying where authors have provided adaptations of off-the-shelf tools and algorithms, or where the city of Catania differs from other smart cities applications.

Detailed comments:
- authors claim a set of requirements should be drawn from the Catania example but I didn't find any indication of lesson learned or guidelines that might be relevant for other similar cities.
- producing linked data is one thing, publishing them properly is another problem. Authors seems to use these two terms interchangeably, it would be good to clearly see where the production and the publishing have been addressed, respectively.
- page 2, sec 1, authors claim that through linking knowledge, correlations can be quickly understood. This is not true, since it would require some analysis and inference and other tools to understand correlations, while linking can help reuse and integration, or even enable the use of interoperable tools to understand correlations.
- LOD is not widely adopted, LOD is a set of datasets. LOD principles might be adopted (although not so widely as we might think, if we look at real city cases).
- Literature review is a bit approximate. Things like the use of linked data streams id mentioned but not related to any of the contribution of the paper, since it seems to act mostly on static datasets. Although the data is expressing dynamic properties, the query and processing mechanisms to deal with Linked Streams, which is the core of LSM and continuous query processing, are not in the scope of this paper
- Additionally, use case scenarios are dealing with streams but there is no technical contribution of this paper on the processing side.
- section 3 mentions the generation process is language-independent. This is a strong claim when we talk about semantic annotation, which has been experienced and documented in a lot of EU Smart City projects. Therefore it is necessary to either provide pointers and justify the claim, or explain in more detail up to which extend the solution is language independent.
- Similar comment apply to entity linking: are SOTA techniques used or is there anything new/specific to the city? Linking to DBLP is not novel and has its limitation. The paper does not discuss why this is appropriate in this case.
- section 3.2 uses state of the art techniques for data transformation in RDF, and it is not clear how the choice for Catania helps identifying requirements or guidelines in what methods to chose for this virtualization step.
- a lot of adaptation and customize scripts (which is strongly data-dependent!) is used in section 3, but it is not clear how much of manual effort is involved in this process, and also why custom solutions have been chosen over more flexible solutions has not been sufficiently motivated.
- authors claim alignment with standard vocabularies has been done when possible but there is no detail provided on he level or reusability of existing solutions and when was alignment not possible. Reusability and resort to standard vocabularies is one of the key aspects of interoperability and reuse for Linked Data
- 3.3 mentions the use of ontology design patterns in other smart city projects: references would be required here.
- more motivations re. use of certain patterns for ontology design should be provided: this would drive the requirements for data abstraction based on the experience of Catania but it is not present in the paper.
- 3.5 mentions again standard ontologies but no specific references are provided, and it is not clear how the examples have been realized: how much manual involvement was required and is this an issue? if not why?
- Section 3.6 illustrate the query aspect. Distributed solutions based on CKAN or similar platform are in general more scalable for query processing, therefore the centralized resulting dataset with millions of triples is not the best approach. Authors should justify why they made this choice and how it would compare with a distributed approach
- There are no APIs or reusable solution for the implementation of the scenarios, which makes it limited in terms of impact and uptake
- Also the intelligence on top of the data integration layer should leverage the semantic description and although a knowledge representation model is described, it is not indicated what tools have been used for such model for reasoning, and how they perform. State-of-the-art in location estimation are not relevant to knowledge-based data model and more relevant comparison with rule-based approaches, constraint programming, planning and scheduling approaches applied in smart city solutions should be referenced.
- the idea to have a system for emergency re-routing is not detailed: is there a prototype or is this just an idea? Again here there are state of the art solutions that have not been mentioned, and that strongly leverage semantics and continuous query processing over Linked Data Sterams.

There are a few typos throughout the text that need to be corrected, e.g.
- emergent should be emerging
- "may used" should be "may be used"
- "mode details" should be "more details"
- "objects in a map" should be "objects on a map"
- …
I suggest to do a spell check and read through carefully.

Review #2
By Carlos Granell submitted on 02/Jan/2015
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The manuscript clearly matches the topic of the special issue about the role of semantics in smart cities. The topic obviously is broad and this manuscript focused exclusively on the creating of linked data which can be useful in smart cities scenarios. So, I found the paper very illustrative in the sense it is a step-by-step description on how to produce RDF-based linked data from diverse data sources. The authors relied on well-known practices, tools and ontology design patterns to align the resulting data to existing semantic vocabularies within the proposed ontology. This is then the main motivation and contribution of this submission: a practical case to enable semantic data into smart cities applications. It is of value that the semantic repository is going to be used by the local administration in Catania.

I am wondering whether the paper as it is now—a set of detailed steps to generate RDF from diverse sources—, fits into the purpose of the SWJ journal. It seems clear that the paper does not provide novel contributions to the state of the art on data semantics. It is otherwise a practical, case study paper, which is also quite fine. Nevertheless, to this reviewer, it failed to present all key aspects to actually become an excellent case study paper. If others want to reuse the methods and procedures described in the manuscript, the authors should also provide a section to discuss the problems and limitations they encountered during the experiment, and not only what has gone smooth.

While the exposition of Section 3 is well shaped, easy to follow and to understand its significance, Section 4 (Use case) is not sufficiently clear for the general reader. I believe that a more thorough rewrite might help here to make it more understandable to the general public. This of course is not meant to diminish the effort and value of the reported experiment, but use case examples seem to be somehow discontented from the section 3 in that each section seems to target to distinct end users/readers.

Review #3
By Freddy Lecue submitted on 23/Jan/2015
Review Comment:

Producing Linked Data for Smart Cities: the case of Catania

This paper presents methodologies for transforming city-related data in Linked Data together with an ontology for public transportation routes. Two applications have also been designed in this paper.


The introduction is nice and in-depth but it is rather difficult to extract the glue between the various contributions. I would recommend the authors to better articulate the contributions of the paper. In its current shape, I see the paper as a list of contributions, and it is not clear what is the overall objective achieved. Could you re-articulate the Introduction in lines to the contributions?

The contribution related to "ensuring semantic interoperability during the transformation process" is not clear. I have read this claim and its "explanation" but could not parse any meaning. Could you re-phrase or better position? I understood it as "semantic linkage" through reasoning e.g., consistency checking but not sure this is what is meant in this contribution.

It is not clear why semantic Web technologies and Linked Open data is a MUST technology for addressing the two scenarios. It is not clear from the Introduction where semantic Web technologies are considered for achieving the challenges, which by the way are?

"All produced data, models, and prototypes are publicly accessible online": Links should be provided here otherwise I am not sure to measure the interest of having this sentence here.

Literature review

The literature is not well positioned. I would expect the authors to step back and look at what have been done in the context of smart (non semantic) cities and then transition to smart semantic cities.

This section is also to general and far too much conducted towards LOD. I would have expected more references to the works, which has been done in various cities e.g., Dublin [1], Rio [2] or other [3] in Transportation

Not sure you need that long on LOD initiative in different countries. I would concentrate on applications for cities, and more importantly on data integration problem (as the problem you are tackling has been largely addressed by the Semantic Web and Database community, albeit no with that focus on the city application)

Building a Government Data Model for Smart Cities

This section needs to be compacted. I would definitely prefer a table summarizing all data sets and features rather than lengthy text that needs to be parsed.

They are a lot of useless or un-explored details that would need better motivations to be included in the paper e.g.,
(1) "Those data have been re-engineered, following the directions given by information analysts and data experts of the Municipality of Catania with respect to the considered reference domains" as not explained. So not sure what type of re-engineering? Was it complex? Was it time consuming?
(2) "Each of the supplied information data sources has required a different methodology to be analyzed": What do you mean by analysis?
(3) "consisting of several databases". And so what? Did that complex the process? Did you address the distributed settings?

Figures 1 and 2 are too large, please re-size. Not sure Figure 2 helps in understanding the methodology - that is a simple kml file which does not help by any means. I would suggest the authors to remove.

What data streams mean in Table 1? Data stream should refer to dynamic information. Where is the dynamics here?

Why Tabels? You should explain why this tool was better than any other. May be other cities have different needs, and applying this methodology to other cities will not work? or may be will? But we need more details to help the other cities to take decisions on one tool / methodology or another.

The resolution of Figure 4 is not really good. Could you update it?

No discussion on why a specific vocabulary has been used. What is the intention behind using one vocabulary and not another one. Or may be that is switchable. At least that should be discussed to help other cities take the right decisions based on your experience.

XML data into RDF can be easily done with XSLT. Why did you need other scripts for that. Again we need strong evidence and motivation behind the choices, which have been made in this project.

A lot of details are not really required such as the deep understanding of "Maintenance of the public lighting system of the city". Not sure that really help to understand the contribution of the paper.

It would have been better to provide details on the procedure for transforming XML data to RDF in one or two examples, are cutting out details of the details sets.

Figure 5 is not required - please remove.

This section is overloaded. Better to have details on data sets, then procedure for transforming and then details on ontology alignments. This section needs a better presentation.

The following is true "The alignment was a manual process done by domain experts. Although methods for automatic alignment exist [47], they are not as precise as human judgment" but need more to explain why existing techniques fail in your settings while it succeeds in some others. Could you give more details?

Use Cases

The DL formalism is not appropriate in this location, and even not introduced earlier in the paper.

No experimentations are given with respect to scalability and accuracy of all the system used, so it is difficult to get anything valuable for replication in other cities.

Algorithm 1 in Figure 16 is trivial - I am not sure you need it. A few sentences about the process should be more appropriate, especially in a section related to use cases.

In general this part is ok but would need some experiments and lessons learnt (on a much more practical dimension) on deploying semantic web technologies in the city context.

Discussions and conclusions

This section is rather a conclusion than a discussion. Such link of paper should emphasize the benefits and limitation of the technologies, which are missing in this version. Even the general idea is good, and the use case as well, the level of description is not appropriate.

[1] Freddy Lécué, Simone Tallevi-Diotallevi, Jer Hayes, Robert Tucker, Veli Bicer, Marco Luca Sbodio, Pierpaolo Tommasi: Smart traffic analytics in the semantic web with STAR-CITY: Scenarios, system and lessons learned in Dublin City. J. Web Sem. 27: 26-33 (2014)

[2] Freddy Lécué, Robert Tucker, Simone Tallevi-Diotallevi, Rahul Nair, Yiannis Gkoufas, Giuseppe Liguori, Mauro Borioni, Alexandre Rademaker, Luciano Barbosa: Semantic Traffic Diagnosis with STAR-CITY: Architecture and Lessons Learned from Deployment in Dublin, Bologna, Miami and Rio. International Semantic Web Conference (2) 2014: 292-307

[3] Plu, J., Scharffe, F.: Publishing and linking transport data on the web. InternationalWorkshop On Open Data abs/1205.1645 (2012)