From its beginnings in the early 1990s, BnF has always considered digitization as a reproduction and preservation technique in its own right. This principle is reflected in choices relating to formats, resolutions, and image acquisition. Digitization in image format continues to be a priority, since it makes it possible to provide users with a faithful reproduction of an original document. Accordingly, digital documents are structured and organized using specific methods to ensure that they can be both properly viewed and protected.
The full range of digitization rules is brought together in BnF’s Technical Digitization Charter (Charte technique de numérisation).
Metadata
Metadata is a structured collection of information describing a given resource.
Metadata does not necessarily describe electronic documents, and is not necessarily contained within the document it describes.
Identifying digital documents: the refNum XML schema
The refNum XML schema was developed by BnF in the 1990s to manage digital document production metadata. It is a proprietary schema belonging to BnF and used by applications that check data delivered by service providers and the library’s studios. Although similar to the METS standard, the refNum schema is simplified so as to facilitate production.
The refNum XML schema describes the descriptive and technical data associated with a digital document. It has various functions:
- Identifying a document by way of bibliographic metadata (production metadata (digitization date, resolution, scanners used, processing history, etc.), and structural metadata.
- Matching digital images with their logical equivalents in the original document. For example, a refNum file might enable a user connected via the Internet to view page 3 of the original document, even though that page corresponds to image number 5 in the digital document.
The schema consists of three main components:
- Bibliography: this succinct high-level data defines the original document type (graphic material, monograph, periodical, etc.), title, author, publication date, and number of pages. This data is not a substitute for catalog data.
- Production: this data provides information about digitization conditions, including in particular the date of digitization, number of images, delivery-related data, and a list of processes and their history.
- Structure: this is a list of the images (or “object views”) comprising a digital document and their metadata, which varies from project to project: image content key if required, foliation or pagination, capture depth, resolution, etc., as well as comments on consistency with the original document.
From 2009 onwards, in the context of the implementation of the Scalable Preservation and Archiving Repository (Système de Préservation et d’Archivage Réparti/SPAR), metadata on digital documents will be expressed using METS. Since the refNum schema is being maintained for production purposes, metadata files will be converted into METS format when they are added to the archiving system.
METS = Metadata Encoding and Transmission Standard
The
METS schema, which is maintained by the Library of Congress, is a standard for encoding descriptive, administrative, and structural metadata specific to digital objects.es.