Nordic Spatial Humanities: Ups and Downs in LOD Implementation across Humanities’ Digital Spatial Research Infrastructures in the Nordic Countries

Tracking #: 3754-4968

Authors: 
Alexandra Petrulevich
Henrik Askjer
Sara Ellis Nilsson
Peder Gammeltoft
Andreas Lecerof
Emily Lethbridge
Øyvind Liland Gjesdal

Responsible editor: 
Oscar Corcho

Submission type: 
Application Report
Abstract: 
The article constitutes a report of a LOD application attempt undertaken within the humanities’ spatial research (SRI)/spatial data infrastructure (SDI) sector. The case study is carried out on the geocoded data, mostly place-names and place-name attestations, of the four chosen Nordic SRIs: Icelandic Saga Map, Mapping Saints, Norse World and Norwegian Place-names. Ontologically, the case study aims at the implementation of Linked Art Data Model across the four resources. Methodologically, the SRIs data went through cleaning, transformation and augmentation stages. The results section demonstrates that the outcomes of the LOD implementation and the test querying have been uneven which is explained partially by the differences between the SRIs as well as by time constrains. In the discussion, the article reflects on possible alternative methodology, technological challenges such as scalability as well as its contribution to the field.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Dec/2024
Suggestion:
Reject
Review Comment:

The paper discusses the results of two workshops, held in September 2022 and May 2023, conducted under the Nordic Spatial Humanities (NSH) project (2022–2024). The authors present lessons learned and outcomes from these workshops. The paper claims the creation of a functional RDF output derived from three of the four data sources (ISM, NW, and NPN). This output was published through a search portal where the spatial aspect served as the common denominator across the data sources.

1. Quality, Importance, and Impact

Quality. The work appears below average in quality. The description of LOD integration requirements, data sources, architecture, and exploitation is minimal. The project involves data augmentation and aggregation of ISM, NW, and NPN datasets with spatial features, but lacks validation of results. The authors admit that time constraints and limited familiarity with certain ontologies negatively influenced outcomes.

Importance. The project data, code, and related resources have been available during the project’s lifetime (2022–2024). However, the decision to discontinue the resources post-project suggests the authors implicitly acknowledge that the outcomes have limited importance.

Impact. The work’s impact seems to be limited to the NSH project’s participants. The paper does not provide details on workshops number of participants, follow-up activities, or broader influence. Additionally, the decision not to maintain project outputs diminishes its potential long-term impact.

2. Clarity and Readability

The paper is challenging to read, with issues in organization and formatting. Furthermore, the paper does not adhere to the recommended SWJ format.

3. Long-Term Stable URL for Resources

Not available. In addition, the application lacks a GitHub, Figshare, or Zenodo repository.

Detailed Review

- The term “spatial research infrastructures” (SRIs) seems to be coined by the authors. Existing literature in digital humanities already addresses spatial information extensively without using this term.
- The paper does not establish how its approach differs from or improves upon similar work.
- The three research questions are overly ambitious and cannot be addressed only through insights from two workshops. A more detailed and technical research work would be required.
- The concept of a “spatial humanities infrastructure” (p. 15) and its requirements are insufficiently explained. Furthermore, the claim that “showcasing a semantic portal may not have been the most suitable case” is not backed up by evidence.
- The claim that the spatial aspect is the main common denominator between datasets requires further data to discard other alternatives.
- Detailed data on the size of the input datasets and the resulting integration dataset is missing.
- The design of the URI is missing.
- It is not clear which parts of the system were developed during which workshop. A timeline would improve clarity.
- There is no quantification of time or resources invested and the quality of mappings. The paper should include quality measures and effort required.
- The claim of LOD silos requires data supporting it.
- The bibliography contains broken links (e.g., https://www3.uu.se), non-standard references (e.g., RDF, RDFS), and duplicate references (RDF and W3C/RDF).

Review #2
By Sarah Rebecca Ondraszek submitted on 02/Jan/2025
Suggestion:
Major Revision
Review Comment:

The submitted application report discusses the Nordic Spatial Humanities project and an integrated case study that evaluates the advantages and disadvantages of applied Linked Open Data (LOD) for infrastructures focusing on geocoded humanities data in the Nordic countries (using exemplary data from the Icelandic Saga Map, Mapping Saints, Norse World, and Norwegian Place-names).
To harmonize and link the data across these different Spatial Research Infrastructures (SRIs) and make them queryable simultaneously, the authors propose the creation of a Linked Art Data Model at the ontological level and a standardized methodological approach for data processing.

The report provides valuable insight into the current state of SRIs in the field of digital humanities and cultural heritage in Nordic countries. The strengths of the report are its structured methodology for implementing LOD across the four chosen case studies and the clear communication of the advantages and disadvantages of LOD implementations. Additionally, the approach to include several collaborators in workshops proves an understanding of the importance of user involvement in the development process of linked data applications.
The application report's significance lies in its provision of insights into interdisciplinary endeavors concerning geospatial data, with a focus on harmonizing diverse sources to enhance interoperability and reusability. The authors adopt a dual approach, incorporating ontological and methodological/pipeline-oriented perspectives on the project, thereby offering a comprehensive and multifaceted examination of the subject matter. The report demonstrates notable self-awareness, as evidenced by its identification of the framework's and ontology's limitations on pages 14 to 16.
One area for improvement in this report concerns the inaccessibility of the project’s results. The paper should clearly reference the obtained results, ensuring they are accessible for a review and a better understanding of the value of the contribution (also regarding the aspect of convincing evidence for the impact of this study). This would concern, for example, the ontology.
The authors emphasize the significance and impact of the present study by underscoring the role of SRIs and geospatial data in the realm of digital humanities and cultural heritage research. The authors acknowledge the necessity to enhance the existing structures and to establish interconnectivity among resources in view of the growth of available data. They underscore the imperative for sustainability and data reuse to open up future research, a principle that finds application in the FAIR principles and the concept of Open Science.
The report indicates a challenge posed by the heterogeneity of infrastructures, attributable to two factors. Firstly, the unique structures of geospatial SRIs in Scandinavia, are characterized by data stemming from diverse sources within the humanities and cultural heritage sectors, such as literary sources, manuscripts, maps, and sound files, etc. Secondly, the report highlights divergent requirements for the usability and objectives of a developed infrastructure, also addressing the need for an improved dialogue between recommended bottom-up approaches, and top-down recommendations, which have so far not been finally formulated. With the Nordic Spatial Humanities, they address both issues equally, with a standardized framework and the engagement of various stakeholders in workshops. They included researchers, governmental agencies, and international initiatives (DARIAH-EU, Google, Open Geospatial Consortium, World Historical Gazetteer project, Australian Time Layered Cultural Map project).

In order to enhance the report and demonstrate the study's significance, I would recommend to include quantitative measures for the datasets in the introduction and in the result section. This could include, among others aspects, how many data points are included, what is the size of the dataset, what does the LOD landscape looks like, e.g. the coverage of LOD application in SRIs in Nordic countries, an evaluation of applied semantic technologies (outside of the four observed infrastructures). An example: Evaluate how many of the identifiers from the Wikidata have been used in Mapping Saints, and check how the LOD-ification in the project changed it semantification.
This relates to the need of a broader contextualization: I recommend contextualizing the findings within the larger framework of LOD initiatives in the humanities and digital research. This is especially important because initiatives like DARIAH-EU were mentioned. Additionally, the authors should explain why the four SRIs were chosen with a clear rationale. This could include discussing the unique characteristics of each SRI, how they relate to the research questions, and how they represent a range of common challenges in similar projects. Such a rationale could also be used to compare the four SRIs based on their content and results in the discussion section.
The choice of CIDOC-CRM and the Linked Art Model is clear. However, the design process of the ontology should be defined in more detail on pages 10 and 11. Instead of introducing RDF, it would be interesting to learn more about the design process behind the choice. For example, was a certain design methodology used? Was user feedback involved? This could be done in the method section, before the description of the data transformation process.
The "What about MS?" section could be shortened and replaced with a comparative analysis. Additionally, a quantitative evaluation of the results would strengthen the concluding remarks. A comparative analysis of the data before and after the LOD implementation, or the number of successfully converted resources, would be a valuable addition. However, in its current state, the section lacks a comparison against existing frameworks. Equally, the authors should address the state of the art of similar projects, also in terms of ontology design and methodological framework.

Overall, the paper is well-written and easily comprehensible. However, there are a few irregularities in the use of acronyms throughout the paper (some acronyms are partially introduced, while others are not introduced, e.g., SSH, GIS, etc.) and missing references for fundamental concepts, such as FAIR or Open Science principles, or mentioned frameworks, like Iconclass. Also, a few headings are consistent in their capitalization.
Furthermore, a few sections appear slightly imbalanced and could be shortened accordingly. This concerns the definition of core concepts, such as RDF, and the detailed explanation of the MS use case. These sections could be edited as mentioned in the previous paragraph, replacing parts with a comparative analysis of the use cases.
The authors should align the visual representations of the concepts across the projects (pages 5, 6, and 7), as well as the CIDOC and Linked Art model (page 10), to a similar graph style to ensure consistency and clarity in the presentation of information.
Finally, the addition of visual aids could potentially enhance clarity for the reader by illustrating the met challenges. For instance, Figure 8 could be expanded to include exemplary data from one use case and then compared to another to highlight the connection (spatial resources) and identify which aspects might require more attention.

Review #3
Anonymous submitted on 07/Jan/2025
Suggestion:
Reject
Review Comment:

This article presents the results of a project led by Scandinavian researchers to set up a Spatial Research Infrastructure (SRI) based on semantic web technologies. The aim of this infrastructure is to improve the reuse, linking and analysis of data initially available in 4 different infrastructures in order to facilitate research in digital humanities. The datasets considered seem heterogeneous but share the common feature of containing geospatial data.

Although the topic is of interest to the community, in my view the article has a number of shortcomings that make it incompatible with the journal requirements.

While the article was submitted as an ‘Application Report’, it instead presents a high-level view of the reflections carried out by researchers as part of the Nordic Spatial Humanities project (funded by NordForsk between 2022-2024), particularly those relating to two workshops that were organised. The article does not present in detail the originality and novelty of the infrastructure integrating the 4 existing SRIs, but rather a general overview of these existing SRIs and the directions that have been considered for choosing an ontology to describe the data and for querying them. This choice is directly indicated in the introduction in which 3 research questions that the article attempts to answer are indicated. These are not focused on the application but on an attempt to establish good practices for the use of semantic web technologies in digital humanities projects. The LOD principles are highlighted in the research questions and throughout the article as a strong motivation for the project. However, it is regrettable that the authors do not describe what they consider for these principles. In particular, it would have been interesting to indicate how these principles are to be understood in the context of the digital humanities. The same applies to the FAIR principles. From my point of view, making a resource ‘Findable’ or ‘Reusable’ (etc) does not have the same impact depending on the type of users targeted, particularly when it concerns SSH researchers. Another important point in the tackled questions is the specificity of humanities materials in Nordic countries. This point deserves to be developed, as the article only puts forward general considerations such as the heterogeneity of materials, the variability of place names and the variability of expressions used to locate them. In my opinion, these points are common to a large number of geolocalised data sources.

Another shortcoming of the paper is its lack of positioning in relation to the state of the art. On the one hand, this concerns the ontologies used to represent territorial units. Several have been proposed in addition to the 2 considered.
- http://data.ign.fr/def/geofla
- http://rdf.insee.fr/def/geo
- http://data.ordnancesurvey.co.uk/ontology/admingeo/
https://rdfdata.eionet.europa.eu/ramon/ontology.rdf
http://rdfs.co/juso/
http://www.geonames.org/ontology
- Hiebel, G., Doerr, M., Eide, Ø.: Crmgeo: A spatiotemporal extension of cidoc-crm.
International Journal on Digital Libraries 18(4), 271–279 (2017)
- Kauppinen, T., Henriksson, R., Sinkkilä, R., Lindroos, R., Väätäinen, J., Hyvönen,
E.: Ontology-based disambiguation of spatiotemporal locations. In: IRSW (2008)
- Kawtar, Y.D., Hind, L., Dalila, C.: Ontology-based knowledge representation for open government data. International Journal of Intelligent Systems and Applications in Engineering 10(4), 761–766 (2022)
- Bernard, C., Villanova-Oliver, M., Gensel, J., Dao, H.: Modeling changes in territorial partitions over time: Ontologies tsn and tsn-change. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing Pages (SAC ’18). p. 866–875 (2018)
- Charles,W., Aussenac-Gilles, Nathalie Hernandez, HHT: An Approach for Representing Temporally-Evolving Historical Territories. ESWC 2023: 419-435 (2023)

It would be interesting to discuss why these have not been considered. On the other hand, the temporal aspect linked to the spatial data has not been explored. The approaches mentioned above take account of this dimension, which is important when considering digital human resources. In the introduction, this aspect is mentioned as an objective of the project. However, little information is given on how this aspect is dealt with, in particular how places whose names or locations can have changed over time while being the same entity.

What is more, the originality and novelty of the application were not sufficiently appreciated. Even if no finalised application resulted from the project, it might have been interesting to flesh out the recommendations and good practices to be developed beyond the failures reported. The workshops that were held led to the production of video resources, and it would have been interesting to highlight how these video presentations could help other SSH researchers to appropriate the technologies, or how they could be supplemented to make this possible.

Finally the application provided did not live up to the SWJ expectations. Long-term stable URLs have been defined for some resources but not for all of them, and it is not possible to access a long-term stable URL for the entire application. This point is pointed out in the cover letter sent by the authors but without it is difficult to evaluate the application.

In addition to the points raised above, the formatting and the lack of respect for the format of the bibliography make the document difficult to read.

Here are some more detailed comments on the content of the paper.
In the introduction what does “token attestations” mean? Examples could be given in order to better explain the difference with “place-names attestations”. This could help understanding the differences between the 4 SRI considered.

The figures presenting the data model of the 3 first SRIs could use a common formalism (with UML diagrams for example). As they stand, it is difficult to understand, analyse and compare them. In addition, a paragraph should be added to describe the content of each of them. The 4 different SRIs deal with locations covering different geographical areas and time periods. It would have been interesting to specify the concrete motivation and potential for integrating these data into a single SRI based on use cases before presenting the global approach.

For the ISM project, it is stated that “environmental concepts that build on the CIDOC-CRM” are considered. However, these are not highlighted in the figure and their integration into the proposed database is not specified. The authors could also explain how “The great potential to link the ISM geo-spatial data with other comparable datasets … clear from the outset” has been considered in the proposed data model. I have the impression that the temporal aspect is considered for individuals and manuscripts but not for places. For this SRI, are places considered to be timeless? Are they considered to be the same entity that two manuscripts describe, even if they concern a different period? How are the places that contain the modelled place represented? The figure shows “parish, country, etc.”, but how is this aspect managed in practice?

For the Mapping Saints project, information could be given on how the data model considers the link with “ data from previous projects and to national authorities”. In Figure 2, it seems that the description of a place is marked in time. What identity criteria are considered when deciding whether to create a new place with the same name? Does each mention of a place over a different period lead to the creation of a new entity? How are thesauri considered in the SRI?

The Norse World SRI seems to be dedicated to a place name nomenclature. Has the proposed model been compared with that of other Nomenclatures? Is the temporal aspect of the use of these names taken into account? This does not appear to be the case from Figure 3, although I think this information could be of interest.

For the Norwegian Place-names SRI the data model could be given to explain how the different entities are represented.

When presenting the framework, for me, the part presenting RDF is not of great interest. More information could be given on the CIDOC-CRM elements considered and in particular their link with the elements already present in the 4 SRIs considered. A description on how the names and the temporal aspect linked to the places can be taken up could be added. Moreover, as Sampo-UI is included, the authors could explain if they have considered the ontologies proposed in the SAMPO project.