In this paper, we consider the task of automatically identifying whether different parts of medieval and modern manuscripts can be traced back to the same copyist/scribe, a problem of significant interest in paleography. Currently, the application of deep learning techniques in the context of scribe recognition has been hindered by the lack of a sufficiently large, labeled dataset, since the labeling process is incredibly complex and time-consuming. Here, we propose the first successful application of the recent framework of self-supervised learning to the field of digital paleography, wherein we pretrain a convolutional neural network by leveraging large amounts of unlabeled manuscripts. To this end, we build a novel dataset consisting of both labeled and unlabeled manuscripts for copyist identification extracted from the Vatican Apostolic Library. We show that fine-tuning this model to the task of interest significantly outperforms other baselines, including the common setup of initializing the network from general-domain features, or training the model from scratch, also in terms of generalization power. Overall, our results reveal the strong potential of self-supervised techniques in the field of digital paleography, where unlabeled data (i.e., digitized manuscripts) is nowadays available, while labeled data is scarcer.

Self-supervised learning for medieval handwriting identification. A case study from the Vatican Apostolic Library / Lastilla, L.; Ammirati, S.; Firmani, D.; Komodakis, N.; Merialdo, P.; Scardapane, S.. - In: INFORMATION PROCESSING & MANAGEMENT. - ISSN 0306-4573. - 59:3(2022), pp. 1-20. [10.1016/j.ipm.2022.102875]

Self-supervised learning for medieval handwriting identification. A case study from the Vatican Apostolic Library

Lastilla L.
;
Firmani D.;Scardapane S.
2022

Abstract

In this paper, we consider the task of automatically identifying whether different parts of medieval and modern manuscripts can be traced back to the same copyist/scribe, a problem of significant interest in paleography. Currently, the application of deep learning techniques in the context of scribe recognition has been hindered by the lack of a sufficiently large, labeled dataset, since the labeling process is incredibly complex and time-consuming. Here, we propose the first successful application of the recent framework of self-supervised learning to the field of digital paleography, wherein we pretrain a convolutional neural network by leveraging large amounts of unlabeled manuscripts. To this end, we build a novel dataset consisting of both labeled and unlabeled manuscripts for copyist identification extracted from the Vatican Apostolic Library. We show that fine-tuning this model to the task of interest significantly outperforms other baselines, including the common setup of initializing the network from general-domain features, or training the model from scratch, also in terms of generalization power. Overall, our results reveal the strong potential of self-supervised techniques in the field of digital paleography, where unlabeled data (i.e., digitized manuscripts) is nowadays available, while labeled data is scarcer.
2022
handwriting identification; manuscripts; self-supervised learning
01 Pubblicazione su rivista::01a Articolo in rivista
Self-supervised learning for medieval handwriting identification. A case study from the Vatican Apostolic Library / Lastilla, L.; Ammirati, S.; Firmani, D.; Komodakis, N.; Merialdo, P.; Scardapane, S.. - In: INFORMATION PROCESSING & MANAGEMENT. - ISSN 0306-4573. - 59:3(2022), pp. 1-20. [10.1016/j.ipm.2022.102875]
File allegati a questo prodotto
File Dimensione Formato  
Lastilla_Post-print_Self-supervised_2022.pdf

Open Access dal 10/02/2024

Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Creative commons
Dimensione 10.14 MB
Formato Adobe PDF
10.14 MB Adobe PDF
Lastilla_Self_supervised_2022.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.24 MB
Formato Adobe PDF
3.24 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1611118
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 1
social impact