Declarative construction of knowledge graphs from NETCONF data sources

Tracking #: 3744-4958

Authors: 
Ignacio Dominguez
Luis Bellido
Diego Lopez

Responsible editor: 
Guest Editors KG Construction 2024

Submission type: 
Ontology Description
Abstract: 
The knowledge graph paradigm is drawing attention in the network industry as a technology for integrating heterogenous data silos such as model-driven telemetry based on the YANG language. In this sense, declarative mapping languages have emerged as scalable and flexible solutions for constructing knowledge graphs. A prominent mapping language is the Resource Mapping Language (RML), which enables the integration of heterogenous data sources by reusing ontologies that describe access to them. However, when it comes to the network domain, there is a lack of ontologies that describe access to YANG data exposed by network devices. This paper introduces the YANG Server Ontology for describing YANG servers and the interactions with them using network protocols like NETCONF. Additionally, guidelines for reusing the ontology in RML mappings are provided and validated in a use case by extending a reference RML engine.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Pano Maria submitted on 24/Sep/2024
Suggestion:
Accept
Review Comment:

This paper introduces the YANG server ontology as a way to describe YANG servers and to express access to YANG data exposed by them.
It leverages existing protocols and describes a binding to these protocols in the ontology.
Furthermore, it describes how this ontology can serve as a data source description language for the RDF Mapping Language (RML), such that the YANG data can be used to build knowledge graphs.
Finally, the paper shows a use case where the combination of the ontology and an RML processor is used to generate a knowledge graph using the existing YANG catalog and YANG library data.

Since the YANG language is being supported by an increasing amount of vendors in the network devices space, this is a significant effort.

The paper is clearly written and well presented. The ontology was developed following proven methodologies for ontology management and was also tested against ontology pitfalls and fairness assessors. The paper highlights the key elements in the ontology and describes their meaning and intended usage.

Furthermore, the paper describes how the ontology can be combined with the emerging RML ontology to enable knowledge graph creation from network devices. The paper also suggests some points which could be added to the ongoing RML community effort to further support the knowledge graph creation process.

One point of improvement, that would make the provided examples on expressing queries using the ontology more clear, would be to add a listing with some example source data which correspond with the query descriptions. This would especially aid in understanding the workings of the ys:SubTreeFilter, which leaves room for interpretation.

The YANG server ontology is published via the W3ID service for permanent identifiers on the web, and relevant documentation on the artifacts introduced in the paper is provided via a GitHub repository, where instructions for locally generating the ontology documentation are provided in the README. The latter contains both the ontology, and supporting documentation, and also contains the ontology assessment results from the evaluation tools used. Next to this the RML mappings used for the use case are also available in the repository.

Overall this is a very well presented ontology along with a practically useful presented use case.

Review #2
Anonymous submitted on 28/Sep/2024
Suggestion:
Major Revision
Review Comment:

The authors propose an ontology that enables describing the essential characteristics of YANG servers (i.e. network devices exposing an API compatible with the YANG formalism for configuration and operational data) to facilitate interactions with these servers via a knowledge graph construction pipeline using declarative mapping. In addition to the paper, the implementation of the ontology is available as open source on GitHub. An extension of the BURP tool accompanies this proposal (the merge request has been accepted by the owner of BURP) to validate the approach.

Overall, the authors adopt the perspective of network management and monitoring systems by providing the means to construct a knowledge graph based on the description of the network administration layer. The paper is carefully written, and the associated resources (ontology, documentation, mapping examples, tools, BURP contribution) are accessible and of good quality. The contribution is significant in relation to the importance of being able to use knowledge graph technologies for IT Service Management in telecommunications networks at the scale of operator networks. A few corrections and additions in the positioning and explanations of the proposal (see questions and remarks below) will make it a valuable resource for advancing this topic.

Questions and remarks:

- p. 1 "support new use cases such as network service assurance [7] or network digital twins [8]." : why do you only refer to YANG-based works? Why not extend the discussion to other works on cybersecurity and network management that utilize knowledge graphs built from various data sources, whether they describe networks or provide complementary insights on network management? Why not also discuss facilitating the management of YANG models through knowledge graphs?
- p. 2 "YANG data into knowledge graphs is still at an early stage with recent proposals like [9]" : could you provide additional references?
- p. 2 "Related work" : the section content looks more like describing the background knowledge on the knowledge graph construction topic rather than a related work analysis. For instance, the paper falls into the 'Ontology Description' category but also emphasizes access to YANG data sources via YANG servers for the purpose of constructing a graph. I understand the significance of the innovation presented in this paper and the inherent complexity of simultaneously addressing the challenge of accessing this data through declarative mapping while proposing an ontology to represent it. However, since you touch on both areas, I suggest addressing both in your analysis. For example, you state, 'there are no ontologies for describing access to YANG data sources that could be reused in RML,' which is a key point for consideration and a crucial element of your proposal, but it is very lightly argued in terms of motivations compared to more 'traditional' solutions that would involve deploying a YANG-compatible ETL pipeline and structuring the data according to ontologies such as DevopsInfra, NORIA-O, or SEAS.
- p. 2 "RML language is currently under development arranged into multiple modules" : are you referring to the current RML v1.2 standardization effort or the current RML v1.1 'unofficial draft' available at https://rml.io/specs/rml/ ?
- p. 3 "ontological requirements has been conducted over several sprints" & "extracted from interviews with experts from the network industry" : could you provide an overview of the profile of the experts who participated? What are their companies of origin? How many were there?
- p. 3 "derived from the analysis of standard specifications [3, 6, 16, 17]" : all of these references are RFCs, how did you extracted the requirements from these? What were your criteria? Was it a manual extraction based on expert opinions, an extraction assisted by NLP, or an extraction through parsing associated implementations? You seem to use the Competency Questions method: what patterns of CQs appeared to be useful? More broadly, regarding the specification stage, what key principles and concepts did you identify, and how many?
- p. 3 "basic authentication credentials" : how do you envision the use of encrypted data or a reference to a vault? why not reuse already existing vocabularies (e.g. http://xmlns.com/foaf/spec/#term_OnlineAccount or the UCO observable:AccountAuthenticationFacet) to maximize the interoperability of a knowledge graph structured by your ontology with other knowledge bases?
- Listing 1 "ys:endpoint" : I understand that your implementation with a literal makes it easier for BURP to consider the endpoint for establishing the connection. However, as mentioned above with FOAF, why don't you reuse existing vocabularies for this kind of concept (e.g., observable:SocketAddress from UCO)?
- Listing 1 "" & "" : indicates that we have a Datastore, but what do you do with it? is it only for inventory purposes, or do you envision additional use cases? How do you envision tracking state changes over time?
- Section "5. Use case: evolution of the YANG Catalog" : It seems to me that the narrative around the experimentation is somewhat limited and resembles more of a test than an evaluation. Indeed, what conclusions do you draw from the execution of the pipeline and the resulting graph? What challenges did you encounter? What is the size of the input dataset? What is the size of the resulting graph? How did you test the compliance of the graph? Does the ontology ultimately meet the specifications and competency questions? How?
- p. 8 "the metadata can be integrated with data related to the network topology [24]" : same remark as for the "Related work" section ... why only focusing on YANG-based data models for the nework topology?
- p. 9 : "referencing the YANG Library Ontology" : could you provide details about what features you have added in https://github.com/kg-construct/BURP/pull/5 ?
- p. 9 : "unlocking real-time use cases" : could you provide details about the kind of use cases and ideas about how you will implement these (e.g. leveraging the Streaming MASSIF framework [P. Bonte, ISWC, 2020])?

Minor remarks:

- p. 3 "the requirements were captured in the form of natural language statements and stored in a CSV" : I suggest that you highlight here the https://github.com/candil-data-fabric/yang-server-ontology/tree/main/req... repository.
- Section 3.3. Publication : I suggest that you highlight the URL http://w3id.org/yang/server/ directly in the paragraph rather than hiding it in a footnote.
- p. 8 "ii) the semantic layer built with the knowledge [...]" : I suggest that you reformulate the idea using a "by facilitating ... thanks to ..." form.
- p. 9 "The ontology has been developed following a well-known, mature methodology" : I suggest that you reformulate by "The ontology has been developed following a well-known knowledge engineering methodology".
- The colors of the listings can be difficult to read when the document is printed in black and white.
- Typo in the https://w3id.org/yang/server#ConventionalDatastore documentation, see "configuration datastores: , , , and ."
- Missing 'examples/cisco-example.ttl' file in https://github.com/candil-data-fabric/yang-server-ontology/tree/main/kno... to fully asses the proposal.

Review #3
Anonymous submitted on 18/Oct/2024
Suggestion:
Reject
Review Comment:

This manuscript is submitted as 'Ontology Description'.
In 10 pages with a total of 26 references only, it introduces the YANG server ontology, which describes the core concepts of the YANG data model and adds extensions to the specific NETCONF protocol which encodes YANG as XML and relies on SSH for interactions between clients and servers. The ontology is developed following the LOT methodology, conceptualized using CHOWLK, the documentation is generated using Widoco, and the ontology+documentation is available at https://w3id.org/yang/server
The sources are on github, with a total of 22 requirements listed in a csv document, and converted to SPARQL queries that can be executed through a Jupyter Notebook. An example turtle file is available too.

In addition to introducing the ontology, the paper describes how the YANG server ontology can be combined with RML to generate a knowledge graph from the data available at a YANG server. More specifically again, this paper and the implementation focuses on the servers implementing the NETCONF protocol. The combination of this work with RML has been integrated in the reference RML implementation BURP (pull request #5).

In my opinion, the article does not meet the quality standards required for publication in the semantic web journal :

(1) Although the paper is officially an ontology paper and should therefore be concise (10p would be fine), its actual scope is broader. The title of the paper demonstrates it focuses on its integration in RML-based KG construction.

(2) As an ontology paper, I'd say that it has some shortcomings.

- It is the result of applying well a mature methodology, but many details are missing, including the timeline of the sequence of sprints, number of participants (domain experts, ontology engineers, number of pitfalls and how they were solved, same for FOOPS!, ...),
- some statistics about the ontology would be welcome. How many classes, properties, what expressivity, ...
- some additional considerations such as modularity: it would have been useful to better separate what's generic (YANG) from what's specific to NETCONF. I guess basic authentication for example is not relevant for all YANG protocols. It would be probably appropriate as well, for a journal paper, to support at least one more YANG protocol such as RESTCONF or gNMI or CORECONF (CORECONF is not mentioned in the paper).
- the way the YANG Server Ontology and RML can be combined could be specified using simple alignments, or more formally using SHACL rules.

(3) If I consider the part of the paper that focuses on the construction of knowledge graphs from NETCONF data sources (what's the focus as per the title, and also the most relevant to this special issue):

- we're missing a proper validation of the approach. It's a good point that the proposal has been merged in the BURP code base, however this doesn't properly justify the validity of the approach. I would expect some validation through experiments in the paper, with a clear description of the setting (based if I understand well on CESNET/netopeer2). Statistics about KG generation would be relevant, including the duration, how this duration is shared between the YANG server/network/BURP, including size of the exchanged XML documents, number of triples generated, relevance of having filters on the server, etc.
- I miss some discussion about alternative ways to support the conversion of XML data on CORECONF servers. From my understanding of RFC6241, NETCONF must support SSH as a transport protocol (specified further in RFC6242), but other transport protocols could be defined incl. SOAP/HTTP/TLS. So an alternative could be to have data sources in RML send a SOAP request message, and interpret the SOAP response message. An alternative could also be to extend RML with support for SSH connections to some server, then have the logical source element describe what needs to be sent to the server, and how the response must be interpreted ...
- I miss some discussion about what would be different for another YANG protocol. What can be reused from the ontology and implementation, and what needs to be added

(4) Finally, I believe the paper could use more references or could better choose references. For example, there is a reference for the modular RML as the result of 3yrs of existence of the KGC community group (ISWC 2023 Resource Track). Maybe the following papers are highly related work:
- Ismail, H., Hamza, H. S., & Mohamed, S. M. (2018, December). Semantic enhancement for network configuration management. In 2018 IEEE Global Conference on Internet of Things (GCIoT) (pp. 1-5). IEEE.
- Sahlmann, K. (2021). Network management with semantic descriptions for interoperability on the Internet of Things (Doctoral dissertation, Universität Potsdam).
- Sahlmann, K., Scheffler, T., & Schnor, B. (2018, June). Ontology-driven device descriptions for IoT network management. In 2018 Global Internet of Things Summit (GIoTS) (pp. 1-6). IEEE.
The section about related work is really focusing on RML, with only 4 references. If the paper is about the ontology, then related ontologies should be considered.

Review #4
By Edna Ruckhaus submitted on 24/Nov/2024
Suggestion:
Minor Revision
Review Comment:

- Clarity.
+ HTML documentation is provided where classes and properties have been defined and the conceptual model diagram is presented and described.
+ The development of the ontology has followed a well known methodology, LOT, and the steps have been described.

- Completeness and correctness
+ To validate the ontology regarding completeness, competency questions and their corresponding SPARQL queries need to be developed . These must also be published.
+ The correctness of the ontology can be validated with the results of the evaluation of OOPS and FOOPS tools. These results must be added to the repository.

– Extensibility.
+ The ontology seems extensible to other protocols in quite a simple way: adding a subclass to the yg:YangServer class and adding specific classes and properties that are related to this subclass. This extensibility is illustrated with the NETCONF protocol.
+ There is a drawback that should be highlited, regarding the extension of the BURP engine. The extension is specific for the support of the NETCONF protocol. However is is not clear what are the implications on the extension of the engine if a different protocol is required.

(2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.
+ All the related resources are well organized and have been provided in public repositories, except the User stories and the results of the OOPS and FOOPs evaluation which were referred to already in the previous points.
+ The paper is clear and well written. Some improvements are advisable:
++ In the introduction, there is a mention of the "flexibility" of the YANG language. This could briefly defined, with respect to which requirements?
++ In the extension proposed for RML, there seems to be a requirement for mapping languages regarding data sources and mentions that "filtering out data at the server ... brings multiple benefits". Clarify if this is in general a requirement of the KG construction community and if it has been considered in other works.
++ Briefly describe the challenges in your work. You could add them to the Conclusions.