Web archives collected by the BnF can be consulted in research reading rooms either at François-Mitterrand or Richelieu Library.
A legal framework established in 2006
According to articles L131-1 à L133-1 et R131-1 à R133-1 of the Heritage Code, the library is in charge of collecting, preserving and making French internet websites available.
The most important crawls concern:
- .fr domain websites or extensions linked to the French territory such as .re, .nc, etc
- Other domain websites (.com, .org, etc) created by individuals based on French territory and which contents are made in France.
Harvesting is made by robots that copy the pages, images, animations, audio and video files. Then, the websites are dated and indexed to be available in the initial publication context, which allows to browse through the archives, from one link to another.
Some representative samples
It is impossible to archive all sites and all sites’ pages. The BnF collects samples representative of French websites, combining two strategies:
- broad crawls that allow to archive samples of the French web (4,5 million websites in 2016). They are achieved once a year thanks to a partnership with the Association française pour le nommage internet en coopération (AFNIC), the OVH company (consulting company specialized in computer systems and softwares) and the Office des postes et télécommunications de Nouvelle-Calédonie (OPT-NC). In 2016, the BnF also contacted the registries of Top Level Domains for French overseas territories (.gf, .gp, .mq, .pf) and of regional Top Level Domains (.alsace, .bzh, .paris) to better cover these areas.
- Focused crawls that apply to about 20, 000 websites selected by librarians and external partners. These websites have been chosen for their subjects (literature, sustainable development) or for their link to a specific event (such as the elections or the Olympic games in 2012). Focused crawls are used to harvest the deep web (major documentary databases) and carried out on a more frequent basis. Thus, about a hundred online newspapers are harvested everyday to provide the latest in the Web.
Modes of consultation
© David Paul Carr / BnF
At the end of 2016, BnF’s internet archives had collected 29 billion files; some files date back to 1996. Search can be done by typing in the website’s name. Furthermore, a “guiding approach” focusing on a specific topic is proposed to have a first overview of the collection.
There is no comprehensive list of websites available in the Internet archives. However, the factsheets of data.bnf.fr mention the websites targeted to be archived by the BnF (for ex. http://data.bnf.fr/11932277/litterature_francaise/)
You may consult the internet archives in the Research Library’s reading rooms, with the library computers or with your own laptop via the AVEC portal.