Catalogo dei prodotti della ricerca

In this paper, we consider the task of automatically identifying whether different parts of medieval and modern manuscripts can be traced back to the same copyist/scribe, a problem of significant interest in paleography. Currently, the application of deep learning techniques in the context of scribe recognition has been hindered by the lack of a sufficiently large, labeled dataset, since the labeling process is incredibly complex and time-consuming. Here, we propose the first successful application of the recent framework of self-supervised learning to the field of digital paleography, wherein we pretrain a convolutional neural network by leveraging large amounts of unlabeled manuscripts. To this end, we build a novel dataset consisting of both labeled and unlabeled manuscripts for copyist identification extracted from the Vatican Apostolic Library. We show that fine-tuning this model to the task of interest significantly outperforms other baselines, including the common setup of initializing the network from general-domain features, or training the model from scratch, also in terms of generalization power. Overall, our results reveal the strong potential of self-supervised techniques in the field of digital paleography, where unlabeled data (i.e., digitized manuscripts) is nowadays available, while labeled data is scarcer.

Self-supervised learning for medieval handwriting identification. A case study from the Vatican Apostolic Library / Lastilla, L.; Ammirati, S.; Firmani, D.; Komodakis, N.; Merialdo, P.; Scardapane, S.. - In: INFORMATION PROCESSING & MANAGEMENT. - ISSN 0306-4573. - 59:3(2022), pp. 1-20. [10.1016/j.ipm.2022.102875]

Self-supervised learning for medieval handwriting identification. A case study from the Vatican Apostolic Library

Lastilla L.;Ammirati S.;Firmani D.;Komodakis N.;Merialdo P.;Scardapane S.

2022

Abstract

In this paper, we consider the task of automatically identifying whether different parts of medieval and modern manuscripts can be traced back to the same copyist/scribe, a problem of significant interest in paleography. Currently, the application of deep learning techniques in the context of scribe recognition has been hindered by the lack of a sufficiently large, labeled dataset, since the labeling process is incredibly complex and time-consuming. Here, we propose the first successful application of the recent framework of self-supervised learning to the field of digital paleography, wherein we pretrain a convolutional neural network by leveraging large amounts of unlabeled manuscripts. To this end, we build a novel dataset consisting of both labeled and unlabeled manuscripts for copyist identification extracted from the Vatican Apostolic Library. We show that fine-tuning this model to the task of interest significantly outperforms other baselines, including the common setup of initializing the network from general-domain features, or training the model from scratch, also in terms of generalization power. Overall, our results reveal the strong potential of self-supervised techniques in the field of digital paleography, where unlabeled data (i.e., digitized manuscripts) is nowadays available, while labeled data is scarcer.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
			2022
		
	Parole chiave
	
			handwriting identification; manuscripts; self-supervised learning
		
	Tipologia
	
			01 Pubblicazione su rivista::01a Articolo in rivista
		
	Citazione
	
			Self-supervised learning for medieval handwriting identification. A case study from the Vatican Apostolic Library / Lastilla, L.; Ammirati, S.; Firmani, D.; Komodakis, N.; Merialdo, P.; Scardapane, S.. - In: INFORMATION PROCESSING & MANAGEMENT. - ISSN 0306-4573. - 59:3(2022), pp. 1-20. [10.1016/j.ipm.2022.102875]
		
	Appartiene alla tipologia:
	
			01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Lastilla_Post-print_Self-supervised_2022.pdf Open Access dal 10/02/2024 Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Creative commons Dimensione 10.14 MB Formato Adobe PDF	10.14 MB	Adobe PDF
Lastilla_Self_supervised_2022.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 3.24 MB Formato Adobe PDF Contatta l'autore	3.24 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1611118

Citazioni

ND

4

1

social impact