Trust-based retrieval of artwork attributions in digital libraries

Tracking #: 2349-3562

Marilena Daquino
Enrico Daga

Responsible editor: 
Special Issue Cultural Heritage 2019

Submission type: 
Full Paper
Validating argumentations around attributions is a well-known issue in the cultural heritage domain, where competing sources offer contradictory information on the same artefacts. To date, data aggregators allow users to retrieve heterogeneous information faster. However, contradictory information is rarely handled and argumentations are unlikely to be processed due to a number of limitations, namely: arguments are usually recorded in non machine-readable formats, attributions are not integrated with other sources on the web, there is no shared mechanism for ranking attributions, and data may suffer of Information Quality (IQ) issues over time. In this article we argue that Semantic Web technologies can effectively facilitate data harmonisation tasks, can support users' decision-making process when appraising online secondary sources recording artwork attributions, and can avoid expensive curatorial efforts to cultural heritage institutions. In detail, we introduce an ontology for representing argumentations around attributions, methods for measuring Information Quality in the Arts domain, and an ontology-based recommending system of artwork attributions. The aim is to demonstrate the suitability of Semantic Web technologies for solving trust-related problems in the Arts field, and highlight the portability of developed methods to near fields.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 20/Nov/2019
Review Comment:

The paper provides a complete problem statement, solution and validation for dealing with trust-based retrieval of artwork attributions in digital libraries.

(1) Originality
The idea of addressing the integration of contradictory information in terms of attribution and using Semantic Web solutions is an original contribution. The paper provides an approach for enabling the validation of argumentations around attributions in the cultural heritage domain, where sources offer contradictory information on the same artefacts. The problem is “original” even if the solution using Semantic Web is not surprising, even if it is of course pertinent and makes a sound use of Semantic Web techniques.

(2) Significance of the results.
The paper reports on a Semantic Web based solution to support decision-making process based on the access of online secondary sources recording artwork attributions and to reduce curatorial efforts of cultural heritage institutions. The paper presents an ontology for representing argumentations around attributions, methods for measuring Information quality in the arts domain, and a recommendation system of artwork attributions.

(3) The paper is well written with many details, examples and validation. It can be reference for scientists addressing similar problems in the e-Humanities and Information Retrieval disciplines. Figures 3 and 4 are too small and too dense to be analyzed. Figure 5 is important for the experimental results report, yet it is still not that clear, annotations are too small.

Review #2
Anonymous submitted on 18/Dec/2019
Major Revision
Review Comment:

This paper of 24 pages presents a use case of Semantic Web technologies dedicated to artwork attributions in digital libraries. The authors demonstrate that trust of artwork attribution is really challenging. This task needs the access of different datasets describing artworks and artwork attribution statements.

The section state of the art claims that there is not vocabulary to described artwork attribution statements. As far as I understand the proposition, the artwork attribution decision is a kind of argumentation between different expert decisions. Some organizations support one expert decision. An expert decision challenges another expert decision. The cito ontology describes part of the argumentation between expert decisions. The state of the art section should contain a paragraph about argumentation between expert decisions.
See for example A review of argumentation for the Social Semantic Web DOI: 10.3233/SW-2012-0073
The authors should explain why cito is sufficient for describing argumentation between expert decisions in artwork.
The section about “representation provenance and trust in the semantic web” contains two parts: One about trust and provenance description and one about query federation. The query federation part is more focused on implementation problem and should be distinguished more clearly in a separate section.

The proposition starts with a description of artwork attribution statement. The description is composed of three layers:

layer 1) The description of artwork (author, date of creation,...) is based on the Cidoc conceptual reference model. This model has several versions. Which version is used in the work? The associated reference is an article in 2003. The last version of the model that I found on the web is 5.0.4 version published in november 2011.

Layer 2) The description of the artwork attribution. This description is based on the Historical Context Ontology (HiCO) ontology developed by the authors. This ontology is already published on the LOD. Three articles are already published in journal and conference that describe this ontology. This ontology is based on Prov ontology. As far as, I understand the work proposed, I found the model a little bit ambiguous. Some illustrative examples would help the understanding of the Hico Model. For example, the interpretation criterion property seems to be a mix between:
* the authors of the attribution (a scholar )
* an agent that support the attribution made by an expert: market or a museum that support the attribution made by a scholar
* and a source where the author of the attribution describe its attribution of the artwork (scholar’s note on photography).

Layer 3) The description of the source where the artwork attribution is defined. The source is indeed an extraction process of digital catalogue or generic knowledge base like dbpedia. This layer is described by the Named Graphs data model.

Second part of the proposition is composed of some trust measurements. Those measurements is based on generic information quality measurements like relevance (nb of organizations that agree on the statement), reputation (authority of the scholar that produce the statement), reliability (“type of source” to build the statement) and timeliness (the latest statement is the better one).

Third, the crawler mAuth is described. It computes those trust measurements based on several LOD datasets.

Finally a qualitative evaluation of the mAuth crawler and associated trust measurements are presented.

I am not at all confidant about my understanding of the type of sources attribution. But it seems interesting to made a distinction between:
1) an expert (a scholar) that produce the attribution
2) the organization that credit or support the attribution of the expert.
3) the document where the expert declares one attribution. I understand it is a scholar article.
4) the value of the attribution (the creation date of the artwork or the author name of the artwork).
5) the sources that help the expert to create the attribution. It seems that the sources are often photographs of the artwork (image of the artwork where the signature is visible ).
6) the creation date of the attribution.
7) Type of attribution (creation date of artwork or author of artwork)
8) the artwork
These informations seem to be store in only 6 hico properties. I have to say that I do not understand what is a type of sources (is it 3 or 5 or 8?).

Thus some hico properties are not clear enough. I need example to better understand the hico model.

The link between table 1 and the interpretation criterion property is not clear for me. The vocabulary presented in table 1 seems to be redundant with some properties of the Hico model.

Moreover, I do not understand where is store the value of the attribution: the creation date of the artwork or the author name of the artwork. If it is store in the artwork description it means that the dataset should have several version of the artwork description one per artwork attribution. I found this solution strange. The data manager of the artwork attribution description is not necessary the data manager of the artwork description. So how the system keep consistent the whole description of the artwork attribution if the artwork description change?

The prov ontology can be used more deeply to express all the decision concerning the artwork attribution. There is first the attribution created by an expert. Then the support of this attribution made by an organization. The organization has charged the expert to make the attribution. The prov:wasInformedBy property could be extended to express the link between the support and the attribution description. The link between these two activities can be useful to extract the fact that several organizations support the same expert.
I like the work about philosophical influence in the pundit project that reuse the cito ontology.
Grassi, M., Morbidoni, C., Nucci, M., Fonda, S., Di Donato, F. (2013) ‘Pundit: Creating, Exploring and Consuming Semantic Annotations’. In Proceedings of the 3rd International Workshop on Semantic Digital Archives, 1091: 65-72.
I have some question about influence. Can influence be used in the Hico ontology to deduce which expert has the most influence on artwork attribution about a period or artist? The acceptance rating defined in the paper seems to produce this information in a numerical way. The fact is that I do not understand which hico property is used to describe that influence relationship.

The artwork description is made by cidoc crm not frbr model. Why the Hico ontology references the frbr model in the hico:isExtractedFrom property? Maybe there is a link between cidoc crm and frbr. But it would be more consistent to keep the same model for artwork and information source description. The artwork is also a kind of information source.

Some attribution work seems to be documented by photography. It could be useful to extend the HiCo ontology with the Web annotation data Model to describe the note of scholar on photography.

Why cito:refutes property links two hico:InterpretationAct individuals and not cito:agreesWith and cito:citesAsevidence property?
Note that the cito properties have no domain and range defined in the cito ontology. Thus how should be interpreted the cito:refutes property in the hico ontology. Is it an usage link?
Why only three cito properties are reused. It seems that there is some other property like cito:disagreesWith that could be reused. In Hico ontology, is cito:refutes property the logical antonym of cito:agreesWith? Is there any reasoning associated to those properties?
Maybe the hico authors should specialized the cito ontology for their usage and defined new properties.

In order to better understand the Hico ontology I have read the paper
Historical Context Ontology (HiCO): a conceptual model for describing context information of cultural heritage objects.

In this paper, the hico ontology is used to describe letter transcription. As far as I understand this paper, I do not agree on the proposed description of transcription. A transcription text is defined as a new realization of the original letter. The transcription text and its alignments to the original letter is the realization of the transcription (the interpretation). The transcription text is not a realization of the original letter. The author of all the realizations and the original work is the same. But the editors between different realizations may vary.
Thus the example about letter transcription confuses me more than it helps. The presented work is a new usage of the hico ontology.

The section about metric should follow the same order of the introduction in order to help the reader.

I found strange that scholars whose expertise is focused on few artists are penalized. The section about scholar reputation metric proposes two metrics. At the end the reputation metric presented in table 2 is a boolean. This boolean seems to have no links with the scholar reputation metrics.

I do not understand how the reliability is computed. What is the criteria score b? (page 11 line 8). Is it the vocabulary of the thesaurus extended by score? It seems that the hico model is not totally described in the Figure 1. The thesaurus is missing. The thesaurus is not formalized as a skos model.

In the description of the crawler, I would find more useful to have a whole graph example with rdf:label than a code with uri.
For example in listing 3 what is the mauth:hasHindex property?
Only the hico model is partially presented, but to understand the crawler we need also a description of the statistic graph model and the observation graph model.

The qualitative study is made with different user profiles. The table 4 presents 30 users and page 18 line 47 mentions 31 users. As far as I understand, only The scenario 1 compares 2 systems with the crawler mAuth. The two others evaluation scenarios are focused on the usability of mAuth. The evaluation is based on some google form that user fills in. The scale from 1 to 5 is used. Figure 5 needs colors to be more understandable.

I do not understand why the authors have preferred to build some snapshot of some attributions than a new knowledge graph storing all attributions. It would be more easy to compute the veracity of the attributions if the knowledge graph is completed. The whole knowledge graph could provide some information about who influence who. The evolution of attributions seems to be out of the scope of the presented work. Evolution can be an indication of the attribution quality. I needs some explanations about this choice. I understand that attributions have evolved, that is the reason why trust is complicated to evaluate. As far as I understand the hico model does not care about evolution of attribution.

The artwork attribution seems to be an original use case about LOD datasets querying. The paper presents a huge work. Unfortunately I need more explanations to be convinced by this work. I have some problems to understand the hico model and some trust metrics.

Review #3
By Allel Hadjali submitted on 18/Feb/2020
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The paper presents an approach to use semantic Web technologies for enabling trust-based retrieval mechanisms of artwork attributions in digital libraries. A use case is described as well.
The paper contains interesting ideas and it is written in an intelligible way. However, the paper seems as a set of factual descriptions and it terribly lacks formal/mathematical notions. Moreover, many of choices have been made with no justifications
Hereafter, some comments and remarks to improve the paper.
- In section 5, the temporal aspect of attributions is considered in the HiCO ontology. As you know, time is often of gradual /fuzzy nature. Did you represent this characteristic of time?
- In page 9, the reliability is rating between 1 and 10. Could you provide some justifications of this choice?
- In page 10, for reliability dimension, it measured on the basis of twenty two terms. Why this choice?
- I was somewhat surprising about the Boolean measure of domain expert score. It is well intuitive that this dimension is of gradual nature, i.e., a person is an expert of a given domain to some extent.
- In 6.4, several partial scores are obtained. For the ranking model, one needs a global score. I was wondering which the function/operator used to aggregate those partial scores.