Access to digital resources: principles
It is important to note that a document’s table of contents and indexes (geographical indexes, indexes of references to people, etc.) are also converted into text format so that the document can be searched and browsed.
The Gallica index thus consists of metadata, full text where available, existing tables of contents, image keys, and information from external partners’ OAI warehouses.
The search engine used by BnF is Lucene (the Wikipedia search engine).
Lucene is a free search engine written in Java and used to index and search text.
In particular, it enables the various indexed elements of a document to be weighted relative to each other: for example, when searching for the word “wretched”, the most relevant documents (shown at the top of the list) will be those where the word “wretched” is found in the metadata (e.g. the title) rather than in the document content.
Free software for GallicaBnF favors the use of free software for reasons of sustainability, production cost, and software maintenance.
Wednesday, November 6, 2013