Soutenez
le Trésor national
Search Form

Collections and services

Internet archives

Web archives collected by the BnF can be consulted in research reading rooms either at François-Mitterrand or Richelieu Library.

A legal framework established in 2006

According to articles L131-1 à L133-1 et R131-1 à R133-1 of the Heritage Code, the library is in charge of collecting, preserving and making French internet websites available.

The most important crawls concern:

  • .fr domain websites or extensions linked to the French territory such as .re, .nc, etc
  • Other domain websites (.com, .org, etc) created by individuals based on French territory and which contents are made in France.

Harvesting is made by robots that copy the pages, images, animations, audio and video files. Then, the websites are dated and indexed to be available in the initial publication context, which allows to browse through the archives, from one link to another.

Some representative samples

It is impossible to archive all sites and all sites’ pages. The BnF collects samples representative of French websites, combining two strategies:

  • broad crawls that allow to archive samples of the French web (4 million websites in 2013). These crawls are carried out once a year. Today, they mainly apply to .fr and .re domains thanks to an agreement with the French association AFNIC and to .nc domains thanks to a partnership with the Office des postes et télécommunications de Nouvelle-Calédonie.
  • Focused crawls that apply to about 30, 000 websites selected by librarians and external partners. These websites have been chosen for their subjects (literature, sustainable development) or for their link to a specific event (such as the elections or the Olympic games in 2012). Focused crawls are used to harvest the deep web (major documentary databases) and carried out on a more frequent basis. Thus, about a hundred online newspapers are harvested everyday to provide the latest in the Web.

Modes of consultation

 

At the end of 2013, BnF’s internet archives had collected 21.2 billion files; some files date back to 1996. Search can be done by typing in the website’s name. Furthermore, a “guiding approach” focusing on a specific topic is proposed to have a first overview of the collection.

There is no comprehensive list of websites available in the Internet archives. However, the factsheets of data.bnf.fr mention the websites targeted to be archived by the BnF (for ex. http://data.bnf.fr/11932277/litterature_francaise/)

Readers making a search on all collections must use the computers providing access to these collections, in the reading rooms. To access the Internet archives, they need to provide evidence that they require access to the Research Library’s collections for valid academic, professional or personal research.

Friday, August 29, 2014

Contact

Prior to your visit, you may e-mail to depot.legal.web@bnf.fr to check if the website you need is available.

Partagez