Digital legal deposit: four questions about Web Archiving at the BnF

Why archive the Web?

Because there is a need to preserve this new form of communication, which is now present in all areas of knowledge and all parts of society

E-administration, digital arts, online publications, e-learning, e-business, virtual exhibitions and digital libraries, blogs and new public spaces dedicated to discussion and chat… So many activities have moved to the Web and new ones have been created. With more than 30 million Internet users and an increasing number of French Websites, France has finally entered the Information Society and Heritage institutions must face this new reality.

Because archiving the Web through legal deposit extends the historical mission of collecting French cultural heritage

Legal deposit is the legal obligation for every publisher, printer, producer, distributor, or importer of documents to deposit copies of all published materials in the mandated institutions. Originally promulgated for printed books in 1537, legal deposit has been progressively extended to all types of materials of expression and creation, including new technologies as they appeared in France. After books, engravings, music scores, photographs, posters, audiovisual and multimedia documents, the time has come to archive Websites as well.

Because Web legal deposit is a legal obligation

The French Heritage Law (Code du patrimoine) now incorporates the DADVSI law (DADVSI stands for Droit d’auteur et droits voisins dans la société de l’information - loi 2006-961) which was officially published on August 3rd, 2006. Title III (Articles L131-1 to L133-1) officially establishes legal deposit of the Web.

This law:

  • extends legal deposit to the Internet in the following terms: "is also liable to legal deposit every sign, signal, writing, image, sound or messages of every kind communicated to the public by electronic channels" (clause 39). The law applies to all types of “online electronic publications” constituting a set of signs, signals, images, sounds or any kind of message, as long as they are made available publicly on the Internet. Not only Websites, but also newsletters and streaming media are included in this definition.
  • specifies collecting strategies: the BnF's Internet legal deposit gives priority to bulk automatic harvesting by crawler robots: “Mandated institutions may collect material from the Internet by using automatic techniques or by setting specific agreements and deposit procedures together with the producers”. The law also stipulates that no obstacle such as login, password or other form of access restriction may be used by producers to restrict this process.
  • defines the sharing of Web deposit responsibilities between mandated institutions: INA (the national institute for radio and television, which is responsible for preserving the audiovisual heritage of France) collects sites related to audiovisual communications (mostly radio and TV) and the BnF all other sites. The decree published in December 2012 specifies both selection and communication procedures for Web archives collections.

Who is involved?

Anyone who produces or publishes online material in order to communicate with the public by electronic channels is under the obligation of legal deposit. The law is to be applied to all those who have some connection to the national territory – this has always been the case for other types of documents: the National Library of France collects “everything published (or imported) in France”.


Unlike what is done for other materials, the law will not involve any particular procedures for producers because the Web legal deposit will be essentially managed through automatic harvesting techniques run by the mandated institutions. The only obligation for the producer will be to give, when requested by the Library, access codes and technical information if automatic harvesting has failed. A specific deposit procedure may also be implemented at the request of the BnF in any case where the selected site architecture or data format used is not compatible with automatic harvesting.

How does it work?

The size of the Web is exponential: it is not possible to aim for exhaustiveness nor to undertake a manual selection of sites. To respond fully yet pragmatically to the challenges addressed by Web legal deposit, the Library has chosen to combine two complementary collecting methods:

Bulk automatic harvesting of French Websites

Bulk harvesting is done by robots. In the past the BnF worked in partnership with Internet Archive (IA) to collect five annual “snapshots” of websites belonging to the French domain, beginning in 2004. Historical collections representing snapshots from 1995 to 2004 have also been acquired.

In 2010 the bulk harvesting procedures are performed by the BnF itself, with the constant aim of providing a better coverage of French domain sites at a large scale.

Focused crawls

Focused harvesting is based on a selection of sites by subject librarians at the BnF. Focus crawls can be based around an event (French Elections in 2002, 2004 and 2007 have been covered, as well as the European elections in 2009) or be on a given theme (blogs, sustainable development, Web activism…).

Who can access the archives?

The Web archives are accessible to authorized users of the BnF, in the reading rooms of the Research Library only. This restriction is the same as that which applies to all legal deposit collections. As of June 22th, 2009, the BnF offers 350 computers to consult its Web archives across all its sites, in Paris and in Avignon.

Monday, March 10, 2014


Legal Deposit  Department
Digital Legal Deposit Team
Courriel :

For more info