Review Comment:
The dataset is relevant, and its creation implies a significant amount of effort and resources to the institution. I know very well the context of this dataset and the other datasets published by the Library of the Chilean Congress (BCN). I thus consider that this dataset is a fundamental component for the set of datasets published by the BCN and integrates well with them. However, I consider that the quality of the description of the dataset provided in the manuscript is not satisfactory and that there is room for improvements in the way that the dataset is published. For this reason, and to improve this manuscript and the impact of the dataset, I recommend a major revision for the manuscript.
Usefulness of the dataset
-------------------------
The submitted dataset has a clear relevance for the democratic process of Chile and constitutes an example on how to record the legislative work. It focuses on the period after the end of the dictatorship to the present, but also includes some norms before 1990. These earlier norms include such ones produced in the dictatorship period (1973-1990) and norms related to the constitution of 1925. To clarify the context of this work, it is necessary to say that the current Chilean constitution was enacted in 1980 during the dictatorship period, and after weeks of protests in 2019, citizens approved through a referendum the change of the 1980 constitution. Hence, the submitted dataset includes a relevant period of the legislative history of Chile, and some aspects of the processing of the law may be subject to change in the future constitution. These changes require a flexible model, and RDF is therefore a good option to publish this data. The BCN has as mission to help the parliamentary community to fulfill its functions, and to provide access to the legislative information to the citizens to guarantee the normal function of the democracy and the preservation of political heritage of Chile.
The impact of the dataset beyond the influence of the authors (e.g., other research groups using the dataset or datasets linked to this dataset) is not documented. I wonder to know if there are projects outside the BCN consuming this data; or if there is any reason that hinders its use. The existence or nonexistence of such projects has to be made explicit. The authors describe the use of the dataset in a project within their range of influence. It is indubitable that the dataset has a potential impact beyond the influence of the authors. The arguments in favor of the dataset uptake can be stronger if the consumption of the dataset beyond the influence of the authors is documented.
Clarity and completeness of the descriptions
--------------------------------------------
The manuscript does not provide a clear and complete description of the dataset. To use the dataset, it is required to understand other datasets that have complex vocabularies. The dataset described in the manuscript is integrated into this collection of datasets, so I would expect to see more example queries explaining the use of the dataset, and diagrams describing how this dataset is connected with the other datasets published by the BCN and with external datasets.
There is no enough description of related works. For instance, a comparison with similar datasets from other countries is needed. There is no "related work" section. Cite [53] that address a related work is not present in the body of the manuscript.
Recall that this manuscript has been submitted as a 'Data description', so it has to contain a concise description of a Linked Dataset, that serves as a guide to its usage for various (possibly unforeseen) purposes. Section 4, Proof of Concept, does not serve to this end. Describing a political analysis by using the dataset is out of the scope of the track, and waste several pages that could be used to improve the description of the dataset. Furthermore, that analysis is debatable. It is not clear if this analysis considers projects that are rejected by deputies and thus not voted by senators. Since the text of projects changes along their stages, the differences in the consensus may not be comparable. There are many factors that have to be considered to explain the analysis in Section 4. Again, the 'Data description' track is not for publishing studies from the data, but for describing a dataset and improving its usage.
The manuscript does not clearly explain the bounds of the dataset. In lines 27-29 is stated that "[...] the main semantic web pieces of the dataset are: [...] the description of the dataset by means of DCAT [...]." Regarding the description of the dataset, an IRI of an instance of dcat:Dataset (http://datos.bcn.cl/recurso/catalogo/votaciones) is provided. But in the footnote it is indicated that this URI corresponds to a named graph. The footnote suggests that each dataset (included the one described in this manuscript) is published into a separate named graph. A description for the use of named graphs in the SPARQL endpoint is missed.
There are several datasets described in the document. The DCAT refers the dataset as "Voting dataset", Figure 1 is about the "voting dataset", section 3.1 is about "Members of Congress and Political Parties"
dataset, Section 3.2 describes the "Bills" dataset. I suggest introducing a figure showing all datasets published in the BCN SPARQL endpoint and their relations.
To illustrate the complicated that can be to use the dataset, and thus the need of a better description, let us to consider the query in Figure 5. The connection of a vote with a political party is not as clear as is shown in this query because some parliamentarians change their militancy. Also, some parliamentarians have been elected as independent (without militancy at a party) but in the same pact as candidates that have militancy. To write this query the user has to know the schema of other datasets. In particular, the predicates bcnbio:hasMilitancy and bcnbio:hasPoliticalParty are not described in the manuscript, but in the ontology of the "Members and Political Parties" dataset (document in Spanish here: https://datos.bcn.cl/ontologies/bcn-biographies/doc/). The description property bcnbio:hasMilitancy is not in the documentation, and the description of bcnbio:hasPoliticalParty says that its subject has to be a member of the class bcnbio:Candidato. An instance of the class bcnbio:Candidata is a subclass of org:Role, and is related with the person (foaf:Person) via the predicate bcnbio:candidatureOf. This shows how complicated can be to use the dataset, and why a better explanation of the use of the dataset is needed.
There are also some minor issues:
- "Semantic Web" is written in some parts in upper case, and in lower case in others.
- In line 6 of page 2 says, "Library of the Chilean National Congress (BCN)." It could be confusing that the initials do not correspond to the English name of the institution. It would be useful to explain that these initials correspond to the "Biblioteca del Congreso Nacional."
- The initials KG are used once in all the manuscript, and without being introduced. It would be better to write "knowledge graph."
- Table 1 describes some URIs for the instances of the classes that appear in Figure 1. Why not all classes? Also, there are some classes in Table 1 that are not in Figure 1.
- Figures 1 and 5 have bad resolution and look pixelated.
Quality and stability of the dataset
------------------------------------
The dataset has good quality in terms of the accuracy of the data and the integration with other datasets published by the BCN. However, there are some aspects of the dataset that can be improved:
- Lack of an explicit declaration of the licensing of the dataset. By its nature, it is expected that the data is open, however this is not indicated in the DCAT description of dataset nor in the manuscript. There is a reference in the BCN website about the licensing of data an information
(https://datos.bcn.cl/es/terminos-de-uso/) but is not directly connected with the dataset. That license is based on the UK Open Government License http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
but is not stated as a formal license.
- The manuscript does not describe a preservation strategy for the dataset. There is no versioning policy nor ways to download/access a specific version. The DCAT description of the dataset (https://datos.bcn.cl/recurso/catalogo/votaciones) states an accrual periodicity, but does not indicate its versions. Also, the DCAT description has the dct:issued and dct:modified set to 2011-12-05. It is impossible that the last data correspond to the last modification of the dataset because it includes bill votes from 2021.
- The DCAT description of the dataset provides access to the current version of the dataset via two ways:
(1) using the Linked Data Platform, by navigating through the RDF files corresponding to each entity of the dataset (e.g., file https://datos.bcn.cl/recurso/cl/proyecto-de-ley/3700-03), and
(2) the SPARQL endpoint.
No dumps of the dataset are provided. I know that the BCN has resources to maintain (1) and (2) for several years, but institutions are subject to change, so dumps may help third parties to preserve the dataset (and its versions).
- The dataset is published along with other datasets. Indeed, in page 5 line 24 it is said that several datasets have been published since [5]. The problem is that it is not clear how to separate the dataset described in the manuscript from previously published datasets.
|