National Library of France

Search Form

  Professionals

Download

Digitization and metadata

From its beginnings in the early 1990s, BnF has always considered digitization as a reproduction and preservation technique in its own right. This principle is reflected in choices relating to formats, resolutions, and image acquisition. Digitization in image format continues to be a priority, since it makes it possible to provide users with a faithful reproduction of an original document. Accordingly, digital documents are structured and organized using specific methods to ensure that they can be both properly viewed and protected.
The full range of digitization rules is brought together in BnF’s Technical Digitization Charter (Charte technique de numérisation).

Digital documents and metadata

A digital document is a set of unrelated files, described by a unique identifier that encompasses a collection of metadata:
  • descriptive metadata, intended to:
    • provide a detailed, in-depth bibliographic description in a standardized data exchange format;
    • link the document to its original or to different versions of the document; 
    • provide access to a digital copy.
  • structural metadata, intended to:
    • link together files belonging to the same document;
    • reconstruct the document structure: identify all files comprising a document (text files, images, etc.); identify physical relationships between those files (display order, target file through which all other files are accessed, etc.);
  • administrative metadata, intended to:
    • manage rights: access rights (copyright, confidentiality, etc.) and usage rights (printing, reproduction, and editing rights, etc.); 
    • preserve technical information needed to read files; 
    • protect the integrity of files and monitor any changes made to them.

Metadata

Metadata is a structured collection of information describing a given resource.
Metadata does not necessarily describe electronic documents, and is not necessarily contained within the document it describes.

Identifying digital documents: the refNum XML schema

The refNum XML schema was developed by BnF in the 1990s to manage digital document production metadata. It is a proprietary schema belonging to BnF and used by applications that check data delivered by service providers and the library’s studios. Although similar to the METS standard, the refNum schema is simplified so as to facilitate production.

The refNum XML schema describes the descriptive and technical data associated with a digital document. It has various functions:

  • Identifying a document by way of bibliographic metadata (production metadata (digitization date, resolution, scanners used, processing history, etc.), and structural metadata.
  • Matching digital images with their logical equivalents in the original document. For example, a refNum file might enable a user connected via the Internet to view page 3 of the original document, even though that page corresponds to image number 5 in the digital document.

The schema consists of three main components:

  • Bibliography: this succinct high-level data defines the original document type (graphic material, monograph, periodical, etc.), title, author, publication date, and number of pages. This data is not a substitute for catalog data.
  • Production: this data provides information about digitization conditions, including in particular the date of digitization, number of images, delivery-related data, and a list of processes and their history. 
  • Structure: this is a list of the images (or “object views”) comprising a digital document and their metadata, which varies from project to project: image content key if required, foliation or pagination, capture depth, resolution, etc., as well as comments on consistency with the original document.

From 2009 onwards, in the context of the implementation of the Scalable Preservation and Archiving Repository (Système de Préservation et d’Archivage Réparti/SPAR), metadata on digital documents will be expressed using METS. Since the refNum schema is being maintained for production purposes, metadata files will be converted into METS format when they are added to the archiving system.

METS = Metadata Encoding and Transmission Standard

The METS schema, which is maintained by the Library of Congress, is a standard for encoding descriptive, administrative, and structural metadata specific to digital objects.es.
See

METS primer and reference manual
pour une documentation complète sur METS

Monday, March 21, 2011