Web legal deposit : instructions

Navigate by themes


You are domiciled in France

Your site is collected by the BnF as part of the legal deposit of the French internet, by means of a largely automated bot harvesting process
All types of sites and content published on the internet are concerned: institutional and personal sites, magazines, blogs, commercial sites, digital books, social media, etc. Only the public sections are harvested.
Given the number of websites and web publications concerned, web legal deposit is not exhaustive.
Radio and television sites are collected by the Institut National de l’Audiovisuel (INA – National Audiovisual Institute).

You would like to submit your website to the BnF

You do not have to submit anything; not a copy of the content of your site on a medium (CD, DVD, USB key), nor via email.
No action is required (you do not need to declare or send anything, or deposit a digital file).
Your only obligation is not to hinder the BnF in its collection process, in particular not to block the bot.
As the BnF cannot guarantee the exhaustiveness of its collections, I may, if I wish, report my publications or my website by indicating their URL addresses in a message sent to depot.legal.web@bnf.fr.
If the content of a site is inaccessible for technical reasons (database, password-protected content, access form, etc.) or economic reasons (paid content, subscription, etc.), it is currently impossible to collect it and no action is required on your part. If necessary, the BnF can contact you.

Your site is crawled

Once a year, the BnF carries out a broad crawl: the aim of this craw is to archive as many French internet sites as possible, but not necessarily in their entirety.
Some sites are selected by librarians according to documentary criteria. They are the subject of focused crawls that vary in frequency (from once a year to several times a day) and depth (in whole or in part).
A site that has been the subject of a collection request is archived within a maximum period of three months.
The files are dated the day they are collected and indexed with their address (URL). In all cases, the BnF does not send a collection receipt.

Not an exhaustive process

To find out whether a site has been crawled, you can:
  • consult data.gouv.fr and api.bnf.fr: The lists of sites that have been the subject of targeted crawls and the “electoral web” are available for consultation and downloading.
  • consult “Archives de l’internet” and find it using its URL address (see step 6)
  • write to the generic address depot.legal.web@bnf.fr
There is no list of sites crawled as part of the broad crawl (see step 3)

The files crawled: the web archives

Web archives are stored in the BnF’s SPAR (Scalable Preservation and Archiving Repository)

Consult "Archives de l’internet"

In accordance with the French intellectual property code, the archives are not online (on the internet).
They are available in the BnF’s research areas and in partner institutions in the French regions under the same conditions (registration and compliance with the intellectual property code).
Downloading files and screen shots is prohibited and prevented by security measures.
Web archives cannot be deleted, but access may be restricted in exceptional circumstances.

Suggest a site

It is also possible to submit any site in the French domain for archiving at depot.legal.web@bnf.fr. This includes:

  • .fr sites and all French top-level domains (websites ending in .bzh, .corsica, .paris, .gf, .mq, .gp, etc.)
  • All sites whose authors are domiciled in France
  • All sites produced or hosted in France

EFor more information on the technical operation of crawls

Suggest a website