Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark

Fagioli, A.; Avola, D.; Cinque, L.; Colombi, E.; Foresti, G. L.

doi:10.1007/978-3-031-51026-7_39

Writer identification refers to the process of determining or attributing the authorship of a document to a specific individual through the analysis of various elements such as writing style, linguistic characteristics, and other textual features. This is a relevant task in heterogeneous fields such as cybersecurity, forensics, or linguistics and becomes particularly challenging when considering historical documents. In fact, the latter might present deterioration due to time, often lack signatures, and could be authored by multiple people. Complicating matters further, scribes were trained to mimic handwriting meticulously when copying manuscripts, making author identification of such documents even more difficult. In this context, this paper introduces a curated collection of Latin documents from the Genesis and Gospel of Matthew specifically gathered for the purpose of exploring the writer identification task. In particular, the dataset comprises over 400 pages, written by nine distinct persons. The primary objective is to explore the efficacy of state-of-the-art deep learning architectures in accurately ascribing historical texts to their rightful authors. To this end, this paper conducts extensive experiments, utilizing varying training set sizes and employing diverse pre-processing techniques to assess the performance and capabilities of these renowned models on the writer identification task while also providing the community with a baseline on the introduced collection.

Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark / Fagioli, A.; Avola, D.; Cinque, L.; Colombi, E.; Foresti, G. L.. - 14366:(2024), pp. 465-476. (Intervento presentato al convegno Workshops hosted by the 22nd International Conference on Image Analysis and Processing, ICIAP 2023 tenutosi a ita) [10.1007/978-3-031-51026-7_39].

Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark

Fagioli A.;Avola D.;Cinque L.;Colombi E.;Foresti G. L.

2024

Abstract

Writer identification refers to the process of determining or attributing the authorship of a document to a specific individual through the analysis of various elements such as writing style, linguistic characteristics, and other textual features. This is a relevant task in heterogeneous fields such as cybersecurity, forensics, or linguistics and becomes particularly challenging when considering historical documents. In fact, the latter might present deterioration due to time, often lack signatures, and could be authored by multiple people. Complicating matters further, scribes were trained to mimic handwriting meticulously when copying manuscripts, making author identification of such documents even more difficult. In this context, this paper introduces a curated collection of Latin documents from the Genesis and Gospel of Matthew specifically gathered for the purpose of exploring the writer identification task. In particular, the dataset comprises over 400 pages, written by nine distinct persons. The primary objective is to explore the efficacy of state-of-the-art deep learning architectures in accurately ascribing historical texts to their rightful authors. To this end, this paper conducts extensive experiments, utilizing varying training set sizes and employing diverse pre-processing techniques to assess the performance and capabilities of these renowned models on the writer identification task while also providing the community with a baseline on the introduced collection.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Nome convegno
	
				Workshops hosted by the 22nd International Conference on Image Analysis and Processing, ICIAP 2023
			
	Parole chiave
	
				Benchmark; Deep Learning; Historical Handwritten Documents; Writer Identification
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark / Fagioli, A.; Avola, D.; Cinque, L.; Colombi, E.; Foresti, G. L.. - 14366:(2024), pp. 465-476. (Intervento presentato al  convegno Workshops hosted by the 22nd International Conference on Image Analysis and Processing, ICIAP 2023 tenutosi a ita) [10.1007/978-3-031-51026-7_39].

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1713468

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

0

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

Catalogo dei prodotti della ricerca

Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark

Fagioli A.;Avola D.;Cinque L.;Colombi E.;Foresti G. L.

2024

Abstract

Scheda breve Scheda completa

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa