Soutenez la BnF
Search Form

Professionals

Frequently Asked Questions on ARK


How can I obtain a NAAN?

You need to register with the California Digital Library (email: ark@cdlib.org). The registration is free of charge. The CDL will assign a 5-digit NAAN to your institution and will update the Name Assigning Authority Number registry.

Who uses ARK?

ARK identifiers are used world-wide by many institutions such as the California Digital Library, Library and Archives Canada, Bibliothèque et Archives nationales du Québec and the Internet Archive.

In France, besides the BnF, many organizations use ARK identifiers. Most of them are public cultural heritage institutions:
  • Institutions involved in the preservation of data under digital form: National Computer Center for Higher Education (CINES), National Centre of Space Research (CNES);
  • Libraries: Public library of Toulouse, Cujas University Library, Mediatèca Occitana (CIRDOC);
  • Archives: National Centre for Overseas Archives; departmental archives of Allier, Cantal, Côte-d'Or, Doubs, Gironde, Marne, Nièvre, Pas-De-Calais, Rhône, Savoie, Somme, Territoire de Belfort; city archives of Pontivy;
  • Other public organizations with a cultural heritage purpose: the Centre of Baroque Music of Versailles (CMBV) and Son d’Aquí for musical heritage and the National Institute of Art History (INHA).
  • It is also used by other public institutions such as the Ministry of Culture and Communication (MCC), the National Center for educational documentation (CNDP), the City of Paris, the City of Besançon and the General council of Martinique; it is also used by the ACCOLAD regional agency.
The complete list of institutions implementing ARK identifiers is available under the form of the Name Assigning Autority Number (NAAN) Registry: http://www.cdlib.org/uc3/naan_registry.txt.

How can I mint my own identifiers?

a/ Define the structure of your ARK names

First, you need a NAAN.

Then, you should define your own ARK implementation policy: what do you want to identify with ARKs? Bibliographic records? Physical documents? Digital documents? Concepts? If you intend to use ARK for several categories of resources, you should use defined prefixes for each particular subset.

For example, at BnF, ARK names that begin with "cb" are used for catalog records from our General Catalog; identifiers beginning with "b" are used for digital documents; and, for the latter, a "pt6k" prefix is used for digitized books and press, while "tv1b" identifies still images and manuscripts.

b/ Use a piece of software to mint your ARK identifiers

Then, you need to use some software application that allows you to generate identifiers which comply with the ARK specification and are unique within your own NAAN. Of course, this can be an in-house developed tool, but an open source software application already exists to do that: NOID (Nice Opaque Identifiers). This software application has been designed to mint unique identifiers, and can generate ARK identifiers with the appropriate configuration.

A new Web service from the CDL allows you, once registered, to mint and maintain your own ARK identifiers: EZID. This service is available as an online interface and as an API that supports automated mass identifier generation. This service is free of charge insofar as the institution uses ARK (this service also works for DOI identifiers).

c/ Use a piece of software to resolve ARK identifiers

Then, you need some software application that allows you to link a Web address with an ARK name to the corresponding resource.

You must define which host(s) (or NMAH(s)) shall resolve the ARK identifiers in your institution. At BnF for example, two hosts are used: gallica.bnf.fr for digital documents and catalogue.bnf.fr for catalog records. A generic ark.bnf.fr resolver is also used to redirect an identifier to the relevant NMAH (either gallica or catalogue.bnf.fr), depending on the type of resource. Either NOID or the EZID services mentioned in b/ allow you to perform these actions.

Then, you have to define a set of qualifiers that will allow you to ask for a part of a given resource (e.g. a page of a digital document) or services on this resource (particular version of a document, document display, format for a bibliographic record...). For instance, BnF uses /fn to ask for a particular page of a digital document (n standing for the page number), and .chemindefer displays the first page of the document, with the thumbnails of all its pages displayed as a flatplan.

In a nutshell, this step requires you to choose a NMAH, that is, a host that will resolve ARK, and qualifiers, if you need to ask for parts of, or services on, the identified resource.

What is the difference between ARK and DOI identifiers?

Like ARK, DOI (Digital Object Identifier) is a persistent identifier scheme. While ARK is rather coming from the public cultural institutions community (libraries, archives and museums in particular), DOI originates from the publishers and e-commerce community, and is often used to identify online articles and publications.

The generation of a DOI identifier is subject to charge on a per-identifier basis, but the exact fee is left for the DOI registration agencies (RAs), e.g. DataCite or CrossRef, to define.

The Handle system, often associated with DOI, is a tool whose purpose is to mint and manage identifiers in general and DOIs in particular; the already mentioned recent EZID service can also do this. Each resource identified by a DOI must have associated metadata expressed according to the INDECS data dictionary. Therefore, you have to convert any pre-existing metadata to this format.

Thus, while being two persistent identifier schemes, DOI and ARK have rather different approach and communities: ARK is grounded in the free and rather centralized model of public cultural heritage institutions; persistency is therefore thought on the very long term. A great autonomy is granted to every name assigning authority, the CDL having only the responsibility of maintaining the specification and the NAAN registry. On the opposite, DOI is grounded in the more commercial and decentralized model of the publishers and online data providers. Each DOI agency is in practice the operating level where most of the technical choices and offered services are made; with ARK, this intermediary level does not exist, every ARK name assigning authority being free to define its own policy and services. Therefore, the choice of ARK, DOI or any other identifier scheme depends on your goals and your strategy.

Why use ARK and not simple URLs?

URLs are character strings providing access to a resource by means of the HTTP or HTTPS protocol. They have the advantage to give immediate access to the resource, since the HTTP protocol is used everywhere on the Web. However, a URL can easily be "broken", that is, it no longer gives access to a resource (or to the same resource as before). There are many possible reasons for that, including the following:
  • The resources are still there but they have been relocated at a different address: this occurs often when the architecture of a website is changes, or when the resources are moved to another site or host.
  • The resources have been withdrawn from the Web.
  • The resources have been replaced: in this case the same URL provides access to a different resource.
Using ARK tries to address these problems:
  • A unique, persistent identifier is assigned to the resource;
  • With ARK, if the site of the host of the resources changes, the NMAH alone has to be changed, the identifiers remain. Maintaining access to resources over the long term is thus made easier.
  • If one wants to reference a particular version of a resource, it can be managed by defining and implementing a service qualifier the implementor will have defined. However, a behaviour "by default" must be defined for such additional services (what service(s) is(are) used by default if the user does not ask for it).
  • An institution using ARK, by defining its policy, has taken on the responsibility of maintaining the link between an ARK identifier and the resource it identifies. If a resource is removed, you have at least to give a minimal description of the resource and explain that it has been removed.
Of course, these problems can be addressed by defining, at the level of a particular institution, well-defined, easy-to-maintain URLs and a consistent URL policy. The difference here is that ARK has taken these problems into account and addresses them in the ARK specification, so it provides a good overview of the long-term access problems you have to take into consideration. ARK forces one to take these long-term access issues into account from the beginning; which is not the case for URLs.

Thursday, May 10, 2012

Contact

Sébastien Peyrard
Bibliographic and Digital Information Department
sebastien.peyrard@bnf.fr
Partagez