Interview with Julie Claustre-Mayade on the e-NDP project - Notre-Dame de Paris and its cloister
An unprecedented textual documentation project part of the broader restoration endeavors of the Notre-Dame de Paris cathedral
Julie Claustre Mayade is Associate Professor in History, civilization, archeology and art of Antiquity and the Middle Ages. She acts as lead researcher on the e-NDP project “Notre-Dame de Paris and its cloister”, which, from March 2021 to August 2024, analyzes the cathedral’s decisions records from 1326 to 1504 as well as the libraries of the chapter and its canons in the Middle Ages down to Modern times. The e-NDP project aims to secure and ultimately broaden our knowledge of the Paris cloister and its dependencies’ society, economy and built heritage.
We met Julie Claustre Mayade to discuss what the project entails and how it benefits from artificial intelligence.
Julie Claustre Mayade: The e-NDP project was approved in September 2020 with funding from the Agence nationale de la recherche (ANR) in the wake of the 2019 Notre-Dame de Paris cathedral structural fire. It brings together experts in digital humanities, textual editing, Paris history and textual scholarship from the LaMOP (Laboratoire de médiévistique occidentale de Paris) and the École nationale des Chartes’ Jean Mabillon center, and draws on the resources of three great historic institutions housing archives and documentation relative to Notre-Dame de Paris: the Archives nationales, the Bibliothèque nationale de France and the bibliothèque Mazarine.
Scholars and curators work in tandem under the e-NDP banner to further our knowledge of the history of the cathedral’s chapter and cloister while ensuring that the ressources vital to our understanding of Notre-Dame as a whole remain publicly available. The ANR, in turn, provide financial support for the hiring of specialized workers and costly operations — like document digitalization and restoration. The program officially started on March 1, 2021 and will be ongoing for 42 months, ending in August 2024.
In the research environment formed after the Notre-Dame fire, e-NDP is the only project dedicated to textual documents relative to the cathedral’s history. While other projects tend to focus on the building itself (on material remains like stones, stained-glass panels or timbers) and on its architecture, e-NDP has different interests, focusing rather on the borough surrounding the cathedral and thus also on the seigneury of the Notre-Dame de Paris chapter. It also aims to make information widely available, which could allow other projects to account for elements of the cathedral’s built heritage.
e-NDP investigates the archives of a century-old institution, the Notre-Dame chapter
Julie Claustre Mayade: 51 canons were part of the chapter’s community gathered around the Paris bishop. They met three times per week to decide matters concerning the management of the chapter’s rights, and the administration of the cathedral and its heritage, as well as the one of the Notre-Dame cloister society. Impacting thousands of men and women, the chapter’s ecclesiastical power extended within Paris itself, but also within the kingdom of France and the Christian world at large. For instance, the charitable assistance provided by the Hôtel-Dieu hospital — which was the largest in the kingdom — was overseen by the chapter. Besides, the chancelor of the Notre-Dame chapter was also chancelor to the University, and numerous canons were active in scholarly circles. Until the Revolution, the Notre-Dame chapter also benefitted from total legal and fiscal exemptions regarding the city and king. The chapter’s decisions records thus reveal to us the inner workings of what used to be a powerful Paris institution.
Using AI to more easily analyze a massive corpus
Julie Claustre Mayade: The problem we encounter when analyzing the Notre-Dame chapter’s decisions records is that it is a massive corpus which, despite being of great interest, cannot be tackled by a researcher working on their own. Up to this day, it was investigated through modern, incomplete excerpts produced by chapter archivists. Digitizing these original documents and editing them to make them more accessible will very likely improve and expand our knowledge, while also increasing the likelihood and ease of interdisciplinary interactions. This is why members of the e-NDP project work in close collaboration with the archeologists and art historians working on the cathedral’s restoration, in order to try and set up procedures to keep all parties informed of relevant discoveries. Work done in archeology and art history, for example, did not include projects specific to documents and records, and had researchers rely on the existing bibliography or hire contractors to do one-time research in the archives. With the e-NDP project, we study documents systematically: we perform automated handwritten text recognition (HTR) on the original Latin manuscripts, and we can then search and investigate them in a systematic way.
The chapter’s decisions records began in 1326. Held at the Archives nationales, they present as a dense series up until the Revolution, with a total of 170 records. For the e-NDP project, we use two corpuses: on the one hand, the 26 medieval records, covering the years 1326-1504, and amounting to about 14,700 pages of decisions; and, on the other hand, the books from the chapter’s library. The daily life of the entire chapter is consigned in those records — they contain economic and judicial decisions which are both crucial and abundant, but which we can also not easily access to retrieve information. For instance, the canons could, on the same day, decide the fate of one’s of the cathedral’s bells, as well as the fate of the windmills in a village attached to the chapter. We plan to use HTR to transcribe the entirety of the 26 medieval records, and to properly edit three of them, and we have hired a postdoctoral student on a 14-month contract to compile and publish a catalog of the chapter’s old library.
AI, a tool used in medieval studies for years
Julie Claustre Mayade: Before I started working on this project, I did not have any experience with artificial intelligence. I had, however, heard about it at a 2015 conference, and I then became interested in trying the AI-assisted HTR software designed by a lab in Austria — Transkribus, which was later paywalled. The software enabled us to run text-acquisition tests based on a digital photograph of a document. With the help of Pierre Brochard, a LaMOP research engineer, I had a Master’s student, Hugo Regazzi — now a doctoral student with the e-NDP project —, run several of these tests. Awareness of this potential use for AI had then just started to spread among medievalists.
We then discussed with the École nationale des chartes, and specifically with the Jean Mabillon center, since one of their services has interest and expertise in HTR. A partnership between the École nationale des chartes and the Institut national de recherche en sciences et technologies du numérique (INRIA) led to the creation of an infrastructure for automated text transcription. This infrastructure is now called eScriptorium, and it is used by numerous and diverse projects involving work on manuscripts. It is an open access platform, serving as an alternative to Transkribus, and providing suitable work environments for research groups.
Aiming for improvements in AI reading performance
Julie Claustre Mayade: Sergio Torrès, a postdoctoral student we hired on a two-year contract, is in charge of the automated analysis of the chapter’s decisions records, and of AI implementation more generally. It works like this: the AI system starts by suggesting a transcription for a given text, based on the text and image models it was trained with. This is how we obtained a first AI learning model. Following this, our transcription team began meeting in collective workshops for AI training as of October 2022.
Concretely, Sergio Torrès designed an algorithm based on existing data from the École nationale des chartes and the Institut de recherche et d’histoire des textes (IRHT). There were also around 50 record pages which had been transcribed in 2020-2021. This allowed us to have a number of texts whose scripts dated from the same period to use as a textual base. We made drafts on other corpus samples, and then corrected these first drafts. Then we gave these corrected pages to the algorithm and let it run.
Promising first results with HTR
Julie Claustre Mayade: We are still in the training phase for our HTR algorithm. Three training sessions were conducted between November 2021 and January 2022, and text recognition performance is improving. There was an impressive leap forward after the first round of corrections. We also recently obtained a recognition rate of more than 88%, and we think we will shortly achieve a 90% rate.
We are expecting to reach a plateau in the increase of AI performance at some point: we will unavoidably meet obstacles like heterogeneous manuscripts, or problems caused by poor manuscript conservation, or by low digitization quality. Besides, the print format changes from one record to another — which is why we have assigned specific transcriptors to specific records. We also realized that AI tends to do less well with certain scripts and certain records. The corpus as a whole remains very diverse and uneven. So text recognition for one record can be satisfactory — but there are records for which mistakes are still common. What is most satisfying for us, however, is the fact that the HTR algorithm makes human reading much easier. Of course, right now, human eyes correct AI — but AI can already save us time, since it sometimes solves reading problems which would have significantly slowed down human readers. Within our project, the benefits of AI are first and foremost an increase in reading capacities.
Post-processing for topic modelling
Julie Claustre Mayade: For the rest of the project, we still have a large amount of texts at the draft stage. This being said, we are also working on post-processing development, since we want to achieve complete automation for the recognition of proper names, especially for locations and persons. This will allow us, for instance, to know what such or such canon said, and what were the concrete actions which followed. Our postdoctoral student working on HTR, Sergio Torrès, has already done a lot of work on topic modelling (ie. searching for specific topics in massive textual datasets) during his PhD. We hope this work may be of use to other Notre-Dame projects focusing on the archeological or architectural aspects of restoration, so the cathedral can be restored as best as we can.
To me, HTR algorithms help human readers think and progress in paleography. Since AI saves us transcription time, we can focus on other aspects of historical research. We can also pose more questions to the texts we work on thanks to topic modelling than we previously could, and sometimes even novel questions. For instance, when does the king or pope send his representatives to visit the chapter? And questions like these suddenly take us from daily life in the chapter to geopolitical concerns. This gives us another perspective on the cathedral, and allows us to study its place and role within the city and within Christendom. We can thus have a glimpse of a new cathedral: Notre-Dame de Paris as we have never seen it before!
LaMOP researchers taking part in the project:
- Philippe Bernardi, Senior Scientist at the LAMOP (UMR 8589),
- Pierre Brochard, research engineer at the CNRS,
- Olivier de Chalus, PhD student,
- Julie Claustre, Associate Professor in History, civilization, archeology and art of Antiquity and the Middle Ages, coordinator and leader of the e-NDP project,
- Emilie Cottereau-Gabillet, Associate Professor in History, civilization, archeology and art of Antiquity and the Middle Ages,
- Fabrice Delivré, Associate Professor in History, civilization, archeology and art of Antiquity and the Middle Ages,
- Thierry Kouamé, Associate Professor in History, civilization, archeology and art of Antiquity and the Middle Ages,
- Stéphane Lamassé [url: https://www.pantheonsorbonne.fr/page-perso/lamasse], Associate Professor in History, civilization, archeology and art of Antiquity and the Middle Ages,
- Elisabeth Lusset, research fellow,
- Joseph Morsel, Professor in History, civilization, archeology and art of Antiquity and the Middle Ages,
- Hélène Noizet, Associate Professor in History, civilization, archeology and art of Antiquity and the Middle Ages,
- Nicolas Perreaux, research engineer,
- Hugo Regazzi, PhD student recruited for the e-NDP project,
- Darwin Smith, Associate Researcher and second coordinator of the e-NDP project.