TANC HeritagePIDs

Linked Conservation Data and PIDs

Frances Madden (orcid.org/0000-0002-5432-6116)

Posted 12 November 2020

Linked Conservation Data (LCD) is a project led by the University of the Arts London and the Stanford University Library amongst others. It aims to explore heritage conservation data, conservation documentation records, such as treatment reports, expressed as structured data and apply Linked Open Data techniques to it. The project is funded by the UK's Arts and Humanities Research Council and is now in its second phase.

Conservation systems and practices do not have a tradition of assigning identifiers to records or the objects with which they are concerned, and therefore the systems do not always use identifiers consistently either. In addition, conservation systems can often be separate and siloed from cataloguing and collection management systems, as is the case at the British Library, amongst other institutions, reducing the interoperability across systems. The LCD project wants to explore research questions which integrated conservation information can answer e.g. the use of particular techniques over time and the geographical use of particular techniques as well as the information conservation data can provide about a collection. From an identifier perspective the project’s work involves assigning identifiers both to conservation terms and the items which are being conserved.

During this phase of the project, they have produced draft vocabulary guidelines to assist others in producing Linked Data for conservation records and terms used within them. In phase one of the project the need for identifiers as a key component of Linked Data was recognised. The project aims to explore and connect vocabulary terms held in different partners’ cataloguing systems, to be able to cross-link terms. LCD have decided to utilise SKOS which is a long existing defined framework for publishing thesauri, vocabularies and taxonomies in a structured and machine readable format, therefore maximising interoperability. More generally, the idea of creating identifiers for vocabularies presents some specific requirements.

The project is exploring the extent to which terms can be matched and issues concerning granularity of terms. They have also adopted the best practice approaches of assigning Uniform Resource Identifiers (URIs) to the terms used and this requirement is included in the draft terminology guidelines. They are also exploring how best to leverage existing infrastructure such as existing thesauri of terms some of which already have URIs in use, e.g. the Getty Art and Architecture Thesaurus (AAT).

For vocabulary maintainers LCD suggests registering URIs using the w3id.org system. The workflow diagrams produced by the project suggest either linking terms in an organisation’s system to an existing vocabulary which already uses URIs or assigning locally produced URIs to the terms created.

The National Gallery, a partner in both LCD and the PIDs as IRO Infrastructure projects, which has developed their own infrastructure as part of a pilot project to mint URIs, described in the project case study which is soon to be published, has offered to host URIs for the next phase of the project using a simple vocab server or ResearchSpace instance. The issue of URIs being dereferenceable is apparent from the guidelines created by the project, for vocabulary terms. While it is best practice that URIs resolve to information about the term referenced, at the moment this is not an absolute requirement. Making it a requirement would mean that entering the Linked Data realm would require significant new infrastructure to host and serve data and the partners in Linked Conservation Data do not want to make it too difficult for organisations with limited resources to take this first step. In the meantime employing identifiers in current systems helps with adoption of good practices which can lead to making their conservation data available as Linked Open Data.

The benefit from these URIs from being universally and globally resolvable, i.e. being able to resolve to the record without knowing which institution created the record, is not clear at this stage in the project, nor is it being actively explored at this stage for the reasons described above. As an example of a globally resolvable identifier, ‘https://data.ng-london.org.uk/0F6J-0001-0000-0000’ could be expressed instead as a DOI such as ‘https://doi.org/10.12345/0F6J-0001-0000-0000’, removing the requirement for any knowledge by the user of the ‘data.ng-london.org.uk’ namespace. This capability may benefit some of the widely used vocabularies such as the AAT and encourage the adoption of a single or one system, becoming a de facto terminology hub, but issues of granularity and mapping terms from internal systems would still need to be addressed. As most of the organisations involved in the project are memory institutions themselves, they may already have the infrastructure in place to provide long lasting identifiers, however the lack of centralised governance associated with locally managed identifiers presents a risk to the long term accessibility of the URIs in this aspect and the interest and willingness of these organisations to commit to hosting these identifiers is also untested.

The different entity types assigned URIs may also present different issues when it comes to their long term maintenance. While vocabulary terms may be used by multiple organisations and would clearly warrant long term community driven sustainability efforts to maintain the URIs for the terms and keep them updated, the same may not be the case for the URIs for collection items undergoing conservation as they may only be linked and used by its holding organisation.

For the partners involved in the project, some, but not all, are already able to create internal identifiers within their own systems. While the internal system identifiers may provide some degree of solution across internal systems e.g. between cataloguing and conservation, they are not necessarily persistent and they may not be ideal for Linked Data even if they are formed as URIs. It seems sensible for any organisation who is implementing a new system or upgrading an existing system to consider exposing this data in an open format as part of the implementation. The LCD project is investigating hosting the infrastructure on a jointly held repository or portal within the project partners which will be machine readable and query-able via a SPARQL endpoint and API with web-browsing capability for human readable access. This portal, an instance of ResearchSpace, will provide a useful prototype demonstrating the potential of the approaches but a model for its long term sustainability is yet to be decided.

While sustainability is a challenge for any type of identifier, identifiers for heritage objects address several different use cases, including reference and citation as well as conservation, and it is anticipated conservation data would only be one of those, strengthening the case for identifiers being assigned to collection items. The survey conducted by the PIDs as IRO Infrastructure Project (described here) found that many of the organisations who were using PIDs were using URIs as a result of applying Linked Open Data approaches to their collection items (summary data). The use case for conservation data presents another dimension to these efforts and could result in identifiers being embedded in aspects of collections which they otherwise would not.