Approaches to Visualising Linked Data: A Survey

Paper Title: 
Approaches to Visualising Linked Data: A Survey
Aba-Sah Dadzie and Matthew Rowe
The uptake and consumption of Linked Data is currently restricted almost entirely to the Semantic Web community. While the utility of Linked Data to non-tech savvy web users is evident, the lack of technical knowledge and an understanding of the intricacies of the semantic technology stack limit such users in their ability to interpret and make use of the Web of Data. A key solution in overcoming this hurdle is to visualise Linked Data in a coherent and legible manner, allowing non-domain and non-technical audiences also to obtain a good understanding of its structure, and therefore implicitly compose queries, identify links between resources and intuitively discover new pieces of information. In this paper we describe key requirements which the visualisation of linked data must fulfil in order to lower the technical barrier and make the Web of Data accessible for all. We provide an extensive survey of current efforts in the SemanticWeb community with respect to our requirements, and identify the potential for visual support to lead to more effective, intuitive interaction of the end user with Linked Data.We conclude with the conclusions drawn from our survey and analysis, and present proposals for advancing current Linked Data visualisation efforts. (the images in the file below have been reduced in size as compared with the original submission)
Full PDF Version: 
Submission type: 
Survey Article
Responsible editor: 
Krzysztof Janowicz


* Reviewer 1:

Categorisation of users

Response: We agree with the comments. However for the purposes of this paper, a more refined categorisation would divert from the general aims, which is to examine support for the user outside research and development in Computer Science, and the Semantic Web specifically (which we acknowledge also benefits the "tech-user").
We have therefore clarified better the basis on which the categorisation is done, and included additional citations on HCI and user-centred design that discuss user types and their impact on tool design. We have also included a brief discussion in this section of the role of the domain expert who is NOT a tech-user as defined in this paper. (this type of user - the domain expert - IS referred to in relevant parts of the paper, to indicate where domain expertise contributes to ability to make use of tools).

Finally, in the findings section we state clearly the basis on which we categorise a tool as targeted to a tech- or lay-user.


Elicitation of design requirements

Response: We acknowledge the concern that we present a requirements list without clearly citing relevant work. This section has been revised to state clearly existing work on design guidelines from which we derive, first, the high level guidelines for usability in general and especially as applies to tools that aim to support visual analytics. We also supplement these with requirements derived based on the tasks that are carried out by both tech- and lay-users while consuming linked data. For the latter we provide predominantly empirical evidence as presented in relevant publications, both in the Information Visualisation (InfoVis) and Semantic Web (SemWeb) fields. The greater reliance on empirical evidence (rather than established scientific theory) here is due to the fact that best practice in the consumption of Linked Data, especially in the use of visualisation options, is not yet firmly established.

In the findings section we rehash the requirements, as an introduction to our assessment of the tools and functionality available, and use an additional table to list citations that provide evidence for the validity of each (key) requirement. The table (1) is split into two parts, requirements for effective analysis, with a leaning toward visual analytics, and the second part, requirements based on user tasks for consuming linked data.

We have also revised the sub-sections in 5 (Findings) in accordance with the review comments, to summarise the functionality available overall, citing only instances where a specific tool stands out, rather than for each tool.


Evaluation of the functionality available for visualising linked data (Findings section)

Response: (see also previous point) The revision of the sections on requirements provides a clearer description of the basis on which we evaluate the functionality supported by each tool.
We have not found published work describing established guidelines or benchmarks for usability evaluation of linked data browsers, and so are unable to cite or work from established practice in the field. However, general usability guidelines are relevant and applicable; we therefore use an analytical approach to evaluating the tools reviewed, by inspecting the user interfaces and the functionality exposed to end users, structured by the guidelines and requirements specified (which are now clearly cited).
Using the BBC Music Beta project as a baseline we attempted to follow a path from the same starting point (specified in the paper) to browse to related information. Because the tools do not necessarily return the same information we were unable to follow a fixed path, but attempt to make use of at least basic exploration functionality available in each tool. Any other restrictions we faced are clearly stated for each tool.
We also consider evaluation reported by tool owners in reporting our findings.

Tables 2 and 3 have also been updated to reflect the revision of relevant sections of the paper, predominantly the Requirements and the detail in the Findings sections.

The qualitative evaluation approach we use allows a fair assessment of the functionality available in the tools, and is easily replicated (see Carpendale 2008, Sharp et al., 2007, Shneiderman et al., 2009).


Detail on visualisation techniques

Response: A new subsection has been included - 3.1. to discuss briefly a selection of visualisation techniques and the benefits of different approaches. Additional references have also been included in this and §3.2 on visual representation and analysis.


* Reviewer 2:

Concern about omitting the relevance of RDFa

Response: We acknowledge the importance of RDFa in presenting linked data in a human-readable format. We have therefore edited the section concerned to state that the use of RDFa is possible - however this has been defined as returning an XHTML response as this is what occurs when performing content negotiation. We however do not provide more detail here as it would divert from the main message (in the introduction). The relevance of RDFa is also mentioned in the Findings section where we state the basis on which tools are categorised as targeted to tech- or lay-users.


Suggestion of additional tools to review

Response: We initially considered the 'FoaF Explorer' but didn't include it since we classed it as an RDF rather than a (specific) linked data browser. We have however included it in the list of other (notable) RDF and linked data browsers listed at the start of the Findings section.

We however excluded 'FOAF QDOS' and 'Nitelight' because although they are make use of linked data we find that they don't focus on browsing and exploratory analysis of linked data, which our survey examines.

The last is essentially a mashup between 'URI Burner' and the Microsoft's Pivot Viewer. We attempted to try it out as a good example of multiple tools working together, one of which we had already reviewed. However, it returned a security exception as the server would not allow access to an XML resource required to use the Pivot view.


section 1, paragraph 6 states:
"Clear and coherent visualisation of linked data is essential if the Web of Data is to be used outside of the SW community."

Implication that "coherent visualisation" is relevant only for lay-users

Response: We have edited this section to remove the notion of visualisation being 'essential' for lay-users only, so that it makes clear that providing visualisation would 'enable accessibility' to the Web of Data (WoD) and uptake of Linked Data outside of (in addition to) the SemWeb community.


Elicitation of design requirements and degree of coverage

(please see also response to reviewer 1)

Response: We have excluded the quote about the use of linked data in the public domain in §2, in line with reviewer 1's comments about the use of Wikipedia as a citation source.
We have included a new use case, in, (the Research Funding Explorer), to provide a broader view. We have also expanded the BBC use case to mention the larger 'BBC Programmes initiative'.

Further, we renamed §2.2.2 to:
"Why Linked Data? A Public Data Consumption Perspective"
to explain more clearly the aim of this section.

We have also stated that the use cases provide a starting point from which to derive the requirements listed in the paper. The requirements section goes on to cite additional work from which the more complete set of key requirements is derived. Where "key" is the operational word; we acknowledge that we cannot provide an exhaustive list of requirements here, so try to keep them relatively high level in addition to practical.

We have included more information in §3.2 (Design Guidelines) to support our argument/reasoning behind using the challenges defined in §2 to motivate our requirements. This allows us again to identify and discuss key requirements, rather than exhaustive coverage of potential requirements or uses with respect to linked data and its visualisation.


§4.1.4, paragraph 1 states:
"Huynh et al. reported that presenting the information as a collection of items was more suitable for the information seeking tasks they support than would a graph representation."

Reviewer comment: "Given the importance of this point, it might be good to briefly mention why they've reached that conclusion."

Response: (see §4.1.4) - we have expanded the sentence to include the authors' reason behind their design decision: 'to provide a comprehensive view of data' - they felt that a list provided this while a graph would not.


Concern about coverage of the conclusions

(please see also response to reviewer 1)

Response: We have revised the entire 'Findings' section to address comments by both reviewers. This includes the new table and clarification of the information presented in the original tables, in addition to a revision of the layout of the original tables.


"Missing" snapshots

Response: We have located a set of RDF files for the BBC Music Beta pages and have (re)generated some of the images. We have however been unable to obtain useful snapshots for all the browsers.


Use of acronym "MO"

The acronym "MO" is mentioned several times in the paper but no definition is provided. Presumably it is "Music Ontology", however it would be good to define it at least once.

Response: The acronym WAS defined at the first point of use - it DOES refer to the 'Music Ontology'.

Section 1, paragraph 5:
"One of the central issues with large-scale LD production is the accuracy and completeness of links with other datasets."

The LD community commonly agrees on the fact that completeness is a 'nice to have', however, it is not always possible, hence, it is considered to be a non-vital issue.

*** Reworded to say:
"One of the central issues with large-scale LD production is the accuracy and completeness of links with other datasets. Identifying such links using the solitary RDF format of a dataset limits the reader's ability to identify any errors and incorrect links. The LD community recognise that a complete solution to this challenge may not be possible; however, visualisation of Linked Data may help to resolve this, as it enables the identification of such errors more easily, using, for instance, a graph visualisation."

Section 2 Summary, paragraph 1:
"We work from these two cases to ground our discussion with respect to LD visualisation"

It is not clear why the two examples in this section are particularly used to build the discussion. It might help to briefly elaborate on that before moving on.

Have included a brief summary of the types of scenarios used, i.e., chiefly to illustrate the potential in presenting LD to support consumption by BOTH mainstream and technical users:
the and use cases with other LD from public bodies - to illustrate the public interest/mainstream user perspective
BBC Programmes/Music - to illustrate the value of Linked Data for media organisations

Section 2 Summary, paragraph 4:
"Linked data, encoded in RDF and most commonly returned as XML" and "The (default and commonly found) representation – RDF using XML serialisation –"

"Default"ness in particular needs citation to some stats, or be specific about whether it is referring to RDF serializations that are available from SPARQL endpoints, or published data. Otherwise, it could be argued that majority of the SPARQL endpoints nowadays offer several RDF serializations for their dataset, where RDF/XML is one serialization format among Turtle, N-Triples, and RDF/JSON.

Edited to include the other forms of RDF serialisation as examples - as the intention here is to focus on the machine-, rather than human-friendly presentation. Also included new references describing the XML serialisation and its benefits and limitations and confirming that this is the more commonly used for publication - a.o., because it is the W3C recommendation. The evidence is empirical, however, not statistical.

Section 5.7.1. Data Verification & Validation:

Might be good to mention Sindice Inspector