A Semantic Approach to Reducing GHG Emissions

Tracking #: 3687-4901

Authors: 
Kimberly Garcia
Jan Grau
Nicolas Kesseli
Ioannis Katis
Monica Arnaudo
Alexander Kirsten
Didier Beloin-Saint-Pierre
Simon Mayer

Responsible editor: 
Eva Blomqvist

Submission type: 
Full Paper
Abstract: 
In the year 2015, 196 countries signed the Paris Agreement, which aims at keeping the rise in mean global temperature below 2◦C above pre-industrial levels. Governments have since launched awareness campaigns and tightened regulations, motivating companies and governmental organizations to reduce their direct greenhouse gas (GHG) emissions and the indirect emissions of their value chains. To monitor and report on GHG emissions, companies follow standardized methodologies which today remain costly, time-consuming, and require extensive human expertise. In this paper, we present a Knowledge Graph (KG) that forms the semantic backbone of an interdisciplinary research project that aims to significantly reduce the time and effort that environmental accounting experts spend gathering relevant data and validating it. To facilitate data gathering, instead of proposing the creation of a new standard, we created ontologies and management tools for three of the most common GHG data formats—ILCD, EcoSpold01, and EcoSpold02—and we propose a bridge ontology to seamlessly query data expressed in either of these formats. To take advantage of already widely-used ontologies, increase interoperability, and integrate expert knowledge, we follow the Simplified Agile Methodology for Ontology Development to create the WISER ontologies, which are part of the proposed KG and have been created to permit automatic responses to requests by environmental scientists and to capture their domain knowledge. To demonstrate the effectivity of our KG-based approach, we present a tool for data gathering that has been validated by environmental accounting experts. The proposed KG aims at decreasing the effort required for GHG emissions reporting while increasing its transparency and reproducibility. It furthermore democratizes access to GHG emissions data for environmental accounting experts, companies, auditing authorities, and regulatory bodies.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Daniel Hernandez submitted on 03/Nov/2024
Suggestion:
Major Revision
Review Comment:

This paper proposes an ontology for assessing greenhouse gas emissions based on three data formats: ILCD, EcoSpold01, and EcoSpold02. The paper tackles the highly relevant problem of integrating diverse datasets with a uniform ontology. From the beginning, the paper is well-written and easy to follow. However, the description is mostly null when it comes to the central question of this paper: the proposed WISER ontology. The authors describe only two competency questions, and both questions are about identifying (1) the geographic coverage of a dataset and (2) the certainty about the geographic coverage. The central aspect of the description is thus not related to greenhouse gas emissions, nor even to sustainability, but to geolocation. The authors said they describe all the other competency questions in the supplementary material. This excuse is insufficient for the paper because the title and abstract suggest readers will read about creating a unifying ontology for the three data formats they consider.

I would recommend rejecting the paper. However, being generous and considering the possible impact of this work, I recommend the paper for a major revision so they can improve the description of the proposed ontology.

Questions and issues:

Q1. In Section 2.2, the authors enumerate some ontologies that have similar objectives. They claim that the lack of adoption of these ontologies is due to the difficulty of shifting an entire community towards creating new databases based on a new knowledge model. I wonder why, unlike these ontologies, the proposed in this paper will be adopted.

Q2. In Section 3, the authors said they “focus” on ILCD, EcoSpold01, and EcoSpold02 to create the WISER knowledge graph. First, the word “focus” suggests that the authors pay more attention to these formats but may also consider other formats. I wonder if they also considered the other two formats mentioned in Section 2.4, and if they do not consider these formats, why not? If these three formats are the only ones used to define the WISER knowledge graph, I suggest replacing the word “focus” with a more precise one.

Q3. Section 3.2 describes a mapping to translate the XML files from the existing data formats into RDF. They implement this mapping using a custom algorithm. Why do not use a mapping language like RML to define the mapping from XML to RDF?

Q4. Section 3.3 describes the creation of a bridge ontology between the concepts generated from mapping the three data formats to RDF. Please confirm that this bridge ontology consists of axioms A ⊑ B ⊔ C, where A is a bridge concept and B and C are the original concepts, and similarly for properties, or if there is something more. The authors said they based this bridge ontology’s construction on the openLCA project. What is the additional work over the openLCA project contributions? How many of these axioms did you provide? Was this process entirely manual?

Q5. On page 6, line 32, the authors mention “equivalent classes.” So, it suggests that some classes are assumed to be equivalent, and thus, it suffices to add an axiom A ≡ B. Please confirm. How many equivalent axioms did you include?

Q6. On page 7, line 1, the authors say that Figure 3 shows equivalent properties. However, the example shows properties that are not equivalent but subsumed by a common bridge property. Please clarify this statement.

Q7. The scenario in Section 4.1 describes a large international manufacturing company. Is this company fictitious or a real one whose name the authors maintain in secret?

Q8. The footnote 10, on page 9, line 4, does not appear. Also, I do not recommend footnotes in the equation environment because one can interpret them as the power of a number.

Q9. Please explain the central aspects of the ontology you are proposing. As I already commented, the current description only refers to geolocalizing datasets, and it is mostly null regarding the most specific questions related to greenhouse gas emissions.

Review #2
By Eva Blomqvist submitted on 11/Nov/2024
Suggestion:
Major Revision
Review Comment:

The paper presents a new ontology, related to describing greenhouse gas emission data, bridging a number of existing formats in the area. The paper addresses an interesting and pressing problem, which is very real for many companies out there, struggling to meet reporting standards and assess their environmental impact to meet regulations and guidelines. While this is a practical contribution, I currently do not see a sufficiently clear research contribution of the paper, including an insufficient dept of both the discussion of state-of-the-art as well as the evaluation of the proposed artefact. Alternatively this could be converted into an ontology paper, shifting the focus from the scientific contributions to a practical, reusable, resource. However, even with that in mind, both the related work and evaluation sections would need considerable improvement.

More specifically, regarding the journal's evaluation criteria (1) originality, I do not see the original research contribution, i.e. the new knowledge that this paper contributes to the community. I think that the ontology architecture presented, i.e. "bridge" ontologies as mapping tools for expressing connections and transformations between standards and formats, is slightly novel - not as a proposal (many research efforts claim they will do this), but as an actual completed project, where three different formats have been aligned through such a bridge ontology. However, the paper does not take advantage of this novelty, and simply focuses on the resource itself, rather than what we can learn from this effort.

This brings us to review criteria (2), significance of the results, which is unfortunately also low in the current form. Following from the discussion above, the actual scientific results are minimal (most results are practical, an artefact), hence, the new knowledge gained is not significant and the paper will not significantly help other researchers build on this in the future.

Regarding criteria (3), quality of writing, the paper is reasonably well written, in terms of language and clarity, but it is missing several crucial parts to complete the "storyline", as already mentioned.

More specifically, starting from the related work section, this section covers a broad range of topics, from datasets for sustainability to data models for LCA. While for each such topic, the discussion is brief, and the account of state of the art work does not seem complete. If a selection has been made, then it is not clear why particular work was included and other work left out. For instance, knowledge graphs for sustainability has been a topic for several years, and many datasets have been published, including efforts from Google and other large organisations, linked governmental data etc. There is also the whole area of linked energy data, which is not even mentioned. Either this section has to be considerably extended, or perhaps the title should be changed - is it really relevant to survey and contrast against all kinds of sustainability efforts? On the other hand some sections seem incomplete, such as the fact that the PACT formats are not even mentioned in this section, while this work is acknowledge in the introduction as the main alternative (although still emerging), i.e. standardisation instead of mapping between many other formats.

Next, from looking at sections 3 and 4 it is quite unclear what WISER is supposed to be. In the title of section 3 it sounds like it is the KG that is called WISER, and then in the title of section 4 it reads as that section will present the ontology of the KG. However, from the content of the sections it seems that the ontology is already described in section 3 (subsection 3.3), which raises the question what section 4 is actually about? Or are they different ontologies? Is it the development methodology and more details of the same ontology described in section 3.3, or are these actually different things? If they are indeed the same things, then I would suggest to start with section 4, describing the methodology, and details of the ontology, and only after that show how this can be used to represent data, and map between the standards.

There are also many unclear points in the development process: What is actually the scope of the bridge ontology? And WISER? Is it only bridging the geographical aspects of these formats? Or is the geographical requirements and mappings just an example? Several of the figures illustrating examples are not sufficiently explained, neither in terms of notation (e.g. what do the arrows between classes and properties indicate in Figure 3? Domain and range restrictions? And what about the dashed lines "mapping" in Figure 4?) nor in terms of their content. It is also not clear why the notation is different between Figure 1 & 3 and Fig 4 & 5?

In section 3.2 generation of an RDF KG from XML documents is discussed. It is a bit unclear how this fits in with the rest of the paper, which is about the ontology. Does this bring any novelty and scientific contribution, or is it more a part of the use case, i.e. how the ontology can be used? Additionally, the approach is poorly motivated. Why is a translation via Java classes used? Why was mappings, such as using RML, ruled out? And what about other kinds of transformation approaches, such as OTTR/OPPL? If this approach is part of the research contribution, it should also be backed by related work, novelty and generalisability discussed, and choices as the one mentioned above better motivated, as well as results evaluated and discussed. Figure 2 is also not very clear - what do the two arrows mean? The database sends a database to a generic Java class??

Further, sections 3 and 4 need to focus more on the learnings from this work - what are the challenges in creating bridge ontologies? How were they addressed? What are the cases that could not be covered? Why? What other things can we learn from this?

Finally, the main emphasis of the paper should be on the evaluation - this is where we can really learn something, and where the scientific contribution should be grounded. However, for this to be possible, the evaluation should be described in much more detail. The title of the evaluation section 4.2.7 "Set of queries" does not really match the content. I would suggest to make this its own section, called “Evaluation", and then several subsections, e.g. "Evaluation setup", "Evaluation results", "Analysis" etc. While assessing the query performance of the integrated data is an interesting evaluation, actually applying the ontology in its intended use case should also be an essential part. The web application briefly mentioned could be a part of this, but on one hand the description is way too brief, and on the other hand it is not clear whether this has even been used, e.g. in a real use case, by actual users etc. And what can we actually learn from using the ontology for this application? How are we now making reporting or LCA assessment better? What are the gains? In fact, from the paper it is not even clear what the focus of the application is - what does "data gathering" mean? Entering new data into the system, or accessing and gathering data from different datasets/databases?

The paper completely lacks an analysis of the results, and a discussion of limitations and implications of the research.

The supplemental material is comprehensive, but not well documented. For instance, the ontologies lack a documentation page for human consumption, and some ontologies even lack documentation (annotations) in the OWL files (e.g. labels and comments). This makes it difficult to assess the quality of the artefacts themselves.

Minor issues:
- Page 3: "databases that based on a new" -> "that are based on a new"?
- Page 4: "analizing" -> "analyzing"
- Figure 3 - why is one class more orange than the rest? It is also not entirely clear where all the lines go - do they branch out or cross each other?
- What do you mean with TBox-data in section 4.2? Normally data is the ABox.
- Footnotes are missing on page 9.
- Figure 4: What do the different colors mean? Why are some arrows dashed and some not? What does "mapping" mean technically?
- Don't break the listing 1 on two sides of a figure, and on two separate pages.
- Page 11: "grater" -> "greater"

Review #3
Anonymous submitted on 05/Dec/2024
Suggestion:
Major Revision
Review Comment:

Paper Summary

The paper discusses an important topic of green house gas (GHG) emissions which is of high significance. The authors introduce a method for translating data from three common XML data formats (ILCD, EcoSpold01, and EcoSpold02) into RDF knowledge graphs and propose ontology mappings (formalised in the Bridge ontology) to enable common query interface across the datasets described using such formats. In addition, authors provide additional semantic links as part of their WISSER ontology (aligned with GeoNames) to enable more granular geospatial search over location metadata recorded as literal values in the emission datasets.

Strengths:

The paper discusses a highly relevant domain with plethora of challenges highly relevant to the semantic web community. I agree with the authors that semantic approach to GHG emission data could have a significant impact and that this is not very frequently discussed in our community.

The article reports on a semantic integration approach for real world datasets described using established data formats.

The narrative is easy to follow and the authors provide the link to GITHUB repository containing datasets used in experiments, ontologies, and the additional supporting code

Weaknesses:

1) Contributions

Could authors please clarify the main contributions of the article? If it is the KG resulting from parsing existing datasets and described using the bridge ontology, please provide more details on the KG (e.g., size, location, number of datasets integrated, etc.). If it is the method used to integrate the different schemas, could you please extend the description of the process (e.g., how were the experts integrated, how agreement was achieved, etc.). If the main contribution is the Bridge Ontology and the Wisser ontology it feels that there is not sufficient detail in the article at the moment to assess these resources. For example, in https://raw.githubusercontent.com/researchAndMore/swj/refs/heads/main/On... there are a number of classes not discussed in the paper but they also do not have any comments in the actual ontology file.

2) Modelling

Bridge Ontology

Only simple straightforward mappings are discussed in the paper with examples, however, more complex cases are not shown. For example, authors write : "However,
even when classes are not equivalent, we were able to bridge objects and data properties based on the openLCA analysis [16], since some of them provide one-to-one matching, and others denote sufficiently similar properties." What is meant by "sufficiently similar properties" and how was this reflected in the ontology?

It is also not clear how the actual emissions values are modelled which I presume are one of the main results of the search?

Wisser Ontology

Authors claim that "The bridge ontology described in the previous section allows homogeneous querying of heterogeneously described data. However, it is not capable of fulfilling all the practical requirements of environmental accounting experts when gathering data for GHG reporting."

However, I am not sure it is clear why the bridge ontology is not fulfilling the requirements and what these requirements are.

Section 4.2.3 mentions "selected" CQs which gives an impression there are many more in the Github but I have only been able to find four, all focusing on the geospatial aspects (https://github.com/researchAndMore/swj/blob/main/SAMOD/CompetencyQuestio...) Are there more? Are there any CQs for the bridge ontology?

re:Geospatial concepts in WISSER
I am not entirely sure whether the WISSER concepts are really needed as even the properties like :bGeographyParent mirror the properties in the Geonames vocabulary. Could authors please motivate in more detail why should, for example, :bGeography property not have a range gn:Feature and then just use gn:alternateName for the labels?

Please see [1] (missing from the related work and probably relevant) where the observations are linked directly to geonames.

Evaluation:
The evaluation is based on query performance which is hardware dependant but hardware specs are not reported. It is also not clear how the evaluation datasets were created. I am personally not very sure how suitable is this form of evaluation without further context (e.g., how often do users need to run such queries in the real world, etc.). A more interesting evaluation would be some user-based experiment to confirm whether the semantic pipeline indeed fulfils the expected requirements of experts.

The Bridge and WISSER ontologies are also not evaluated, they are missing proper documentation, purl IRIs are confusing and not working (based on the Github repo the bridge ontology seems to be using https://purl.org/wiser# and the WISSER https://purl.org/wiser/)

General comments:

The title mentions "reducing of emissions" however it is not clear how the reduction is to be achieved with the technologies discussed in the paper. I think illustrative motivating example early in the paper would help.

line 38 -39 "The XML schema tags were defined as OWL classes, and in some cases it was preferred to define them as data properties to connect classes directly instead of using identifiers."

Could you please provide an example?

re: Algorithm

line 35 and 36 "ILCD divides concepts at a lower granularity and distributes them among different XML files, creating dependencies among each other"

how were the links and cardinality restrictions created? The algorithm does not seem to cover this as it currently shows only creation of RDF-literals

"Go to 2" -> what does 2 refer to?

A simple example of side by side comparison of XML and resulting RDF would help

How is the management of IRIS handled if same concepts are mentioned in multiple files?

re: data in GITHUB
lines 35 - 44 together with Figure 5 - I found it difficult to understand how the assertions presentenced in lines 35 - 44 are implemented using the WISSER ontology terms. Is the induvial in Fig5 of type WisserGeography (subclass of BGeography)? Can you please point me to the correct file in the Github where the logical constraints are modelled?

I have found

### https://purl.org/wiser#BRERwoCHDE
rdf:type owl:NamedIndividual ;
"RER w/o CH+DE" .

in https://github.com/researchAndMore/swj/blob/main/Ontologies/WISEROntolog...

And I have also found

### https://purl.org/wiser#BEuropeWithoutSwitzerland
rdf:type owl:NamedIndividual ;
owl:sameAs ;
"Europe without Switzerland" .

which is presumably incorrectly linked to the geonames instance representing only European Union?

Misc

Footnotes 10 and 11 are missing form the PDF

Perhaps https://tec-toolkit.github.io/ might be relevant to look at as well

References:

[1] Germano, S., Saunders, C., Horrocks, I. and Lupton, R., 2021, September. Use of semantic technologies to inform progress toward zero-carbon economy. In International Semantic Web Conference (pp. 665-681). Cham: Springer International Publishing