ENVOn: An ontology of 3D environment where a simulated manipulation task takes place

Tracking #: 3043-4257

Yingshen ZHAO
Arkopaul SARKAR
Mohamed Hedi KARRAY

Responsible editor: 
Guest Editors SW for Industrial Engineering 2022

Submission type: 
Full Paper
Thanks to the advent of robotics in shopfloor and warehouse environments, control rooms need to seamlessly exchange information regarding the dynamically changing 3D environment to facilitate tasks and path planning for the robots. Adding to the complexity, this type of environment is heterogeneous as it includes both free space and various types of rigid bodies (equipment, materials, humans etc.). At the same time, 3D environment-related information is also required by the virtual applications (e.g. VR techniques) for the behavioural study of CAD-based product models or simulation of CNC operations. In past research, information models for such heterogeneous 3D environments are often built without ensuring connection among different levels of abstractions required for different applications. To address such multiple points of view and modelling requirements for 3D objects and environments, this paper proposes an ontology model that integrates the contextual, topologic, and geometric information of both the rigid bodies and the free space. The ontology provides an evolvable knowledge model that can support simulated task-related information in general. This ontology aims to greatly improve interoperability as a path planning system (e.g., robot) and will be able to deal with different applications by simply updating the contextual semantics related to some targeted application while keeping the geometric and topological models intact by leveraging the semantic link among the models.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Andrea Orlandini submitted on 13/Apr/2022
Major Revision
Review Comment:

The paper aims at proposing a new ontology, called ENVOn, to represent 3D virtual scenes and to support simulated manipulation task. The use of such ontology can be leveraged to serve a robotic system while planning motions or tasks. The paper provides a rather wide analysis of the state of the art considering works related to semantic information of environments and ontology for robotics. It is argued that past works focus on either domain specific applications or formal geometric descriptions. Also, distinct works have been presented considering topological or semantic map knowledge. Authors underscored that a comprehensive approach seems to be missing and a context-based knowledge structure should be considered.
This is the aim of ENVOn with a three-tier knowledge architecture (i.e., geometric, topological and contextual layers) and inter-layers relations/constraints to maintain coherence. The paper provides a detailed presentation for layers and for concepts in each layer. At the end, it proposes two scenarios to validate and verify the ontology.

The general approach pursued in the paper seems interesting and significant.
Even though the work relies on previous works, the proposed approach seems to me original.

A first concern is related to the lack of strong motivations about the need of such new ontology in a robotic context. In fact, while I see a clear benefit in using ENVOn to represent virtual scenarios, I would expect to see a wider discussion (and relevant motivations) about the actual need for the ENVOn definition in robotics (e.g., connections with a motion planning module or a task planning system). As it is, the paper does not provide such strong connections with robotic applications.

The most appreciable point of the paper is represented by the fact of considering multiple layers. Considering different abstraction layers with suitable relations/constraints to maintain a valid general representation seems to me a really valuable point. In fact, this allows to provide richer representations of a same scenario and therefore serving different kind of queries. Indeed, ENVOn is shown to be able supporting reasoning services at different abstraction levels and this seems a clear advantage with respect to the state of the art.

Also, the simulation scenarios demonstrate the effectiveness of ENVOn in representing simulated manipulation examples with a set of competencies queries addressed through the defined representations. In general, this sounds as a good contribution.

In the final discussion, authors pointed out also some limitations (the most obvious one related to computation latency). In this regard, I would appreciate whether the authors also provide some insight about possible solutions to guarantee an easy deployment and sustainable operations in concrete scenarios. Again, I would expect to see more strong connections with robotic-related future perspectives.

Finally, I think the presentation should be improved. In some parts of the paper the text format is somehow strange. Often, there are large white spaces (e.g., pag 1 column 2, pag 4 column 2, pag 7 9 16 at the bottom) and in some cases text is not well formatted (e.g., pag 14 column 2, pag 18 column 1 and 2). Also some typos (pag 1 column 2 "an geometrical", pag 12 column 2 "For example, The construction", pag 22 "[53]]") or errors (pag 16) are recurring here and there. This may be due to the paper template (I guess authors are using a word template) but, in any case, a careful revision seems to be required. Also, Figure 11 is not so readable, Figures 14 and 18 are composed by rather low quality images. In general, authors should check that figures have a good resolution (esp. for 3D pictures).

Moreover, I would suggest to consider a small example (or one of the considered simulated scenarios) as a motivating use case. So, I would consider to present such case at the beginning of the paper and using it as a motivating and running example throughout the paper. It could be used in section 3 to motivate why state of the art approaches are not enough, e.g., to support questions in Tab. 1. Then, it can be used to support the presentation of layers, concepts and constraints (in sections 4 and 5). In my opinion, this would make the paper easier to follow.

The resources provided with the paper (in a github repository) are well organized though they should be better described (as far as I can see, the README file is empty). More information should be also provided to better support utilization of KB for queries replication.

In general, the paper seems to me worth to be considered for publication but some more work is needed to improve its overall quality. More clear robotic related motivations should be provided, presentation should be enhanced and additional resources description should be improved.

For the above reasons, I would suggest to consider a major revision of the paper before considering it for actual publication.

Review #2
Anonymous submitted on 30/May/2022
Minor Revision
Review Comment:

The authors present a knowledge model for semantic maps that takes into account, and links, items of knowledge from geometric, topological, and "context" (ie application dependent) levels.

I think the model is well presented and interesting, and I am keen to use it or its future developments at some point. I would only recommend minor revisions at this time.

A couple of more substantive points are below:

Context independent semantics (pgs. 12, 13): Environment Complexity and Congestion seem to be qualities that can vary with time. E.g., a room may be crowded during some hours but empty later, and a corridor that is passable now becomes impassable once some items are dropped there. However, the knowledge engineering techniques mentioned do not account for time variation. A little more discussion would help clarify matters here, e.g. by stipulating that the ontology is to be used for "sufficiently short" simulations that certain qualities can be thought of as time invariant. [I notice the Conclusion section discusses the fact that the ontology is at least so far intended to capture snapshots of the world, not the ongoing process of updating, but perhaps this should also be discussed previously]

Table 7, FunctionalObject definition, pg. 13: requiring a functional object to have exactly 1 function seems rather restrictive. In the industrial domain this seems like less of a limitation since tools and machinery seem to have one designated use and the users are strongly encouraged not to deviate from it, or outright forbidden from doing so. However the household domain is quite prone to item repurposing. As an example, an oven can be used to bake, but it can also be used for storage; a knife is typically used to cut, but it can also spread or collect butter. It doesn't seem though that ENVOn restricts its application domain to industry, so perhaps a relaxation of the "exactly 1" constraint may be in order.

The next items are mostly related to aspects of presentation.

State of the art, pg. 4: I cannot understand this sentence: "Cailhol et al. [11] define a place as a topological graph connecting places borders built on octree decomposition." Perhaps you could rephrase it?

Table 1, pg. 5 (note, there is another Table 1 on page 7): CQ1 and CQ8 appear to be the same question. CQ14: perhaps rephrase the text "least complex to across" -- do you mean, easiest to go across?

Figure 1, pg. 6: the hasGeometricModel arrows are not consistently oriented

Definition of AffineTransformationMatrix3D: as far as I know, 3D affine transformation matrices have (0 0 0 1) as their last row; relaxing this constraint allows projective transformations also (making these projective transformation matrices)

Definition of Axis3D, pg. 9: it is not clear from the definition given in table 3 that this is a subclass of Line3D

Definition of OrientedBoundingBox, pg. 8 (in Table 3) and pg. 9: it is not clear that the minimum and maximum points are Point3D, what is asserted is they are AxisPlacement3D (which also relates to a point, but may be overkill for specifying the boundaries of the box once the local coordinate system is also known)

Definition of topological graph elements, pgs. 11, 12: I would need some hand holding here to understand how, e.g., two rooms, the corridor between them, and the doors from the rooms into the corridor, become a topological graph. This is because intuitively it should be the borders that are associated to edges (since borders seem to be defined between two places) and the places themselves as vertices in a graph. Nonetheless, the definition in the paper is opposite: the nodes are borders (so can more places than 2 be incident on a border?), and an edge passes through a place: does this mean the edge describes a way to get into the place connected by passing, or out of it? [the figures given later for the evaluation scenarios don't help, I'd really like to have the graph drawn somewhere, with labelled vertices and edges]

Review #3
Anonymous submitted on 26/Jul/2022
Major Revision
Review Comment:

The paper presents an ontology to support automated manipulation of objects in a 3D environment while taking advantage of virtual applications. The topic is relevant and fitting to the special issue.
Since the manuscript is a "full paper", it is reviewed along the following dimensions.

(1) originality

The authors have paid attention to the existing literature (even though some clusters of authors can be identified), but it's hard to appreciate the actual novel contributions.
The introduction is lacking focus and missing to provide key motivations for the proposed ontology, in particular from an industrial perspective.

The need for a new ontology for geometric description is not properly motivated because the problem has already been addressed in the literature. STEP is mentioned, but it's not clear to what extent it was used.
IFC (and its ontology version ifcOWL) covers most of what is discussed about geometry and topology. In addition, GeoSPARQL and Well-Known Text (WKT) are potentially relevant and should be assessed, since they provide also mechanisms for automatic calculations.

(2) significance of the results

It is not clear why the "automated extracted taxonomy of AP203 is meaningless as STEP standards...".

The specifications (Section 3) could be better developed, because much space is dedicated to a specific little example. Competency Questions should be better explained. It would be beneficial to specify also requirements, goals, constraints, etc. The initial part of Section 4 would probably better fit Section 3.

If more than one ontology is used, then the addition of prefixes would help to avoid ambiguity.

The authors are not addressing the problem of generating the large set of data that are needed to instantiate a 3D environment and run queries. Which would be a feasible workflow? Are you relying on CAD models? Point clouds?
Moreover, basic mathematical concepts (e.g., Vector3D, RotationMatrix3D, etc.) are defined in a verbose way that will pose challenges for the generation and querying of data.

The definition of Hole, Opening, and Container as context-dependent is quite surprising and not well motivated.

The presented simulation scenarios (Section 5.1) are very simple and have little relevance from an industrial perspective. In addition, the authors don't explain how the scenarios were instantiated, therefore it is assumed that this task was carried out manually, making it not scalable. Figures like Figg.11 and 15 are little informative and could be replaced by a table and complete online resources.

Which are alternative technologies that could help to answer the competency questions defined in Table 10 and Table 8? Which is the actual advantage of using SPARQL queries?

SPARQL queries in Figg. 14 and 18 are customized to the specific scenario. Instead, general purpose queries would be expected as reusable contributions, receiving as input only the URI of the solid body or place to be analyzed.

Overall, the paper is focused on the geometry and topology data, so it is difficult to appreciate the advantage of using an ontology to integrate also other knowledge domains, since the contextual part is only marginally addressed.

Also, the conclusions (Section 6) fail to highlight the significance of results. The added value of aligning the proposed ontology with a top-level ontology is not clear in the scope of the paper.

(3) quality of writing

The use of English must be improved with a proper proofreading. There are typos and sentences that need to be rephrased.
e.g., "to across" is not a verb (Table 1).

Table 8 is referenced in page 12, but it is not included nearby. There is a Table 8 in page 21, but its content is not consistent.

(A) The repository contains a README file, but it is basically empty, so the content is poorly documented.

(B) it is not possible to identify the implementation of the two simulation scenarios. Moreover, the repository doesn't include relevant SPARQL queries. Therefore, the provided resources don't help to replicate the experiments.

(C) The authors provided the link to a GitHub repository.

(4) The provided data artifacts are not complete, as commented above.