Innovation and press collections
With the aim of facilitating access to digitised historical press for the period 1850-1950 and increasing the ability of users to access, analyse and exploit this content, the European project NewsEye, A Digital Investigator for Historical Newspapers has brought together national libraries, humanities and social science research groups, and computer science research groups.
To this end, it has developed a toolkit for analysing digitised newspapers on a very large scale, regardless of their language. This work on improving the quality of articles, semantic enrichment (named entities for names of people, places, countries, etc.), stance detection (opinions), and dynamic text analysis, has led to the creation of a multilingual personal research assistant aimed in particular at targeting the new needs of researchers in the digital humanities.
In addition to its technological component, case studies were carried out to test existing tools as well as those developed by the project. Their themes were migration, gender, nationalism, and media and journalism. In this context, a series of articles deciphering news in relation to the historical press was published on the Gallica blog. The topics covered include women in pants and the media history of the curfew.
In March 2021, a conference organised within the framework of this project also focused on research in digitised corpora of historical press. Involving specialists in digital humanities, IT researchers and library professionals, several tools were presented through concrete examples of research connected to the theme of women.
Running from May 2018 to February 2022, the project brought together the Computer Image and Interaction Laboratory (L3i) of La Rochelle University, the project coordinator, the national libraries of Austria, Finland and France; and the Universities of Helsinki, Innsbruck, Paul Valéry Montpellier 3, Rostock and Vienna.
NewsEye builds on previous projects in terms of results obtained – in particular Europeana Newspapers – in the areas of Optical Character Recognition (OCR), analysis of newspaper structure and multilingual content processing (recognition of named entities, stance analysis, and text and data mining).
The BnF participated in this project in order to strengthen its expertise in the enhancement of digital documents, to promote digitised press collections for its audiences, and to develop text and data mining tools on a collection with a high consultation rate in Gallica and Retronews.
The aim of the project was to facilitate access to a multilingual European daily press collection of nearly 18 million pages of searchable newspapers via Europeana, by optimising the automatic recognition of newspaper articles and semantically enriching the metadata related to this content.
A pioneering project in terms of newspaper digitisation and online consultation, it brought together some 17 partners, including 9 national libraries (Austria, Estonia, Finland, France, Latvia, the Netherlands, Poland, Turkey and the UK).
The BnF produced a total of 2.4 million digitised newspaper pages, of which 1.4 million were OCRed and 1 million were structured at article level (Optical Layout Recognition – OLR) and integrated into Retronews and Gallica, thus improving users’ online search experience of these collections.
Between 2012 and 2015, this project played a major part in changing the processing used for OCR and in particular in improving OCR control for its mass digitisation markets. The development of the automatic recognition of named entities in French was carried out with the help of the Laboratoire d’Informatique de Paris 6 (LIP6) at the Sorbonne. This development was a first of its time.
The Europeana Newspapers and NewsEye projects have received European funding. Europeana Newspapers received funding from the Competitiveness and Innovation Programme between 2015 and 2017, and NewsEye received funding from the European Union’s Horizon 2020 Research and Innovation Framework Programme (grant agreement no. 770299).