Digital preservation is not limited to infrastructures and data formats issues; describing the digital object is also necessary to ensure traceability and know:
- Its identifier and intellectual content;
- which files compose it and what their technical characteristics are;
- what structure does it have, especially when it is composed of several files;
- its context of production or reception at the BnF;
- which operations it underwent and which software or human agents were involved in such operations.
For this purpose, every information package in the SPAR system is described by preservation metadata.
Metadata formats used in SPAR
The BnF uses primarily two metadata formats to describe information packages: METS and PREMIS. METS is used as a container format to gather metadata in different formats in a single file, called “manifest”. For specific data formats, other specialized technical metadata formats are used. Very simple descriptive metadata are expressed in Dublin Core, as the ARK identifier allows for linking the object complete description in the library catalog to its digital representation in the SPAR repository.
Preservation metadata implementation in SPAR
METS is a very flexible format, thus documenting the institution choices about its implementation is fundamental. This is achieved by creating a METS profile.
The following major choices about METS have to be made:
- Which sections should be used (almost all of them are optional)?
- Which file groups should be defined?
- Which structural maps are necessary (physical, logical, etc)?
- Which levels of description should be defined in the structural maps?
- What should be the structure of internal identifiers?
The best practices about implementing METS and PREMIS in conjunction had been followed (which METS sections should integrate PREMIS entities description, how to deal with redundant elements in both standards, etc.).
The diagram below shows how the BnF METS manifest integrates PREMIS and other metadata formats.
Use of METS and PREMIS formats in SPAR
Creating preservation metadata in SPAR
The manifest creation is a multi-stage process:
- External or internal producers create Data Objects and, in some cases, metadata (provenance and structure). The digital object descriptive metadata is retrieved from external sources or from the BnF catalog.
- A Submission Information Package (SIP) is created by the pre-ingest modules of BnF entry channels from Data Objects and metadata.
- When the SPAR system generates the Archival Information Package (AIP), the manifest is enriched with technical metadata extracted from the files by characterization tools. An ARK Identifier is attributed to the digital object. Ingestion events are also recorded in the manifest.
- After the completion of the ingest operation, the manifest information converted into RDF is stored in a database in order to allow for interrogation and data management.
The preservation metadata lifecycle in SPAR is diagrammed below.
Preservation metadata lifecycle in SPAR diagram