National Library of France

Search Form

  Professionals

Download

Digitization and metadata

From its beginnings in the early 1990s, BnF has always considered digitization as a reproduction and preservation technique in its own right. This principle is reflected in choices relating to formats, resolutions, and image acquisition. Digitization in image format continues to be a priority, since it makes it possible to provide users with a faithful reproduction of an original document. Accordingly, digital documents are structured and organized using specific methods to ensure that they can be both properly viewed and protected.
The full range of digitization rules is brought together in BnF’s Technical Digitization Charter (Charte technique de numérisation).

Digitization formats

Choice of format is critical to digitization quality and the long-term preservation of digital documents. It should be as open as possible and enable files to be individually indexed; also, if a compressed format is selected, compression must be either lossless or reversible if images are to be protected in the long term.

It is important to distinguish between archiving formats and distribution formats, so as to:

  • ensure that the preservation system is independent of viewing tools and standards;
  • manage access constraints (e.g. rights attached to digital documents, conditions for document reuse, etc.); 
  • ensure favorable conditions for online viewing of digital documents (display time, download time, etc.).

What is a format?

A format describes the way in which information is organized in a file. A data format is said to be open when its specifications are publicly available and can be freely used, with no legal constraints limiting their use or requiring the payment of fees. A format may be proprietary (fully or partially patented) but sufficiently well documented to enable its widespread use.

Some applications are able to make use of a given format based on certain identifying information such as the file extension or information contained in the file header. The majority of image formats used on the web are proprietary but open. They have become de facto standards. There are also open and free image formats, but these are less widespread.

Digitization, archiving, and distribution formats used at BnF

All technical digitization rules laid down by BnF are covered by a charter which describes, in particular, the full range of digitization formats.
  • Printed materials
Printed materials are digitized in monochrome at either 300 or 400 dpi and compressed into single page TIFF format using UIT Group 4 compression. A matching table is created linking electronic page rankings to physical pagination in the original work.
Digitized works are then stored on the library’s browsing servers in multi-page TIFF format. Documents may be downloaded from Gallica, the internet digital library, in either TIFF or PDF format.
  • Fixed images
Opaque or transparent graphic materials (manuscripts, prints, photographs, maps, etc.) are usually digitized in color. Documents larger than A6 size are digitized at 300 dpi, while originals smaller than A6 size are digitized at 600 dpi.
Different resolutions (both 300 and 600 dpi) may be used on the same original if there are significant variations in format, size of information (illuminations, toponyms, small characters, etc.), or associated objects (cases, slipcases, etc.). The archiving format used is uncompressed single page TIFF, and the formats used for distribution are PNG and JPEG.
  • Press

Daily press titles are digitized in full in grayscale at 300 dpi, in uncompressed TIFF format. The archiving format used is single page TIFF, and the format used for distribution is JPEG 2000.

For all new projects, BnF has decided to digitize its documents at a minimum resolution of 400 dpi and to drop monochrome in favor of grayscale (apart from in special cases where illustrations are better rendered in monochrome). It is also looking into the possibility of using JPEG 2000 as an archiving format.

Dpi (dots per inch): a numerical value applied to an image consisting of juxtaposed image components (pixels) arranged in rows and columns.

Checking graphic materials

Ensuring that colors are faithfully reproduced requires vigilance throughout the processing sequence: color management is a particularly complex issue.
When capturing documents, care must be taken to:
  • ensure that there is sufficient illumination;
  • ensure that an effective digitization system suited to the project is used; 
  • calibrate the scanner: define a reference colorimetric state; 
  • use standardized test patterns; 
  • use ICC profiles to faithfully reproduce colors from one peripheral to another (ICC profiles are files used to convert the colors in a peripheral’s colorimetric range to the range supported by another peripheral – e.g. converting colors from RGB (red, green, and blue) into CMYK (cyan, magenta, yellow, and black) for printing); 
  • add to each digital document the images of test patterns digitized on the day on which the document was processed.

The following steps are necessary after image acquisition:

  • Control screens must be calibrated using a spectrophotometer or colorimeter.
  • Images should be physically checked against their originals and in daylight.

Monday, March 21, 2011