The RobotCodico Project

The project partners

- Le laboratoire SAMM (Statistique, Analyse et Modelisation Multidisciplinaire),

- Le Laboratoire de Médiévistique occidental de Paris (Lamop (UMR 8589)),

- Le Pôle Informatique de Recherche et d'Enseignement en Histoire (PIREH).

General Aims of the Project

The project aims to develop algorithms for analysing digitised images of pages from medieval manuscripts. These manuscripts are studied by historians for their text, but also as objects. Codicology is the discipline that looks at them from this point of view: the material analysis of a manuscript tells us about its production methods (paper, ink, techniques for assembling leaves), its economic and symbolic value (illuminations) and its practical use (commentaries, diagrams, presence of several scripts). Since the late 1970s there has been a trend towards quantitative codicology, which tackles these kinds of questions by working on large bodies of manuscripts using statistical methods.

Our idea is to exploit digitisations of manuscripts made available online by the libraries that hold them (and possibly photographs that can be taken with a camera or smartphone) to automatically produce a large number of indicators of page use. Thanks to this quantitative data, we will be able to paint a picture of manuscript production in the Middle Ages in Western Europe, whether in Latin or the vernacular (English, French, Italian, Spanish, German, etc.).

Depending on the elements (detailed below) that we want to detect, count or measure, it seems to us that we can favour either classic image analysis algorithms (work that we have started to delimit the blocks of text and identify their lines), or artificial intelligence algorithms (for which we have started to build up a corpus of training images).

Detection objectives

Example 1: Detection of page layout

Example 2: Simple two-column page, with miniature, lettering in red ink, initials slightly detached from the rest of the line, and a lot of noise (stains, tears, BnF stamps, old call number inscriptions, etc.).