Writer identification refers to the process of determining or attributing the authorship of a document to a specific individual through the analysis of various elements such as writing style, linguistic characteristics, and other textual features. This is a relevant task in heterogeneous fields such as cybersecurity, forensics, or linguistics and becomes particularly challenging when considering historical documents. In fact, the latter might present deterioration due to time, often lack signatures, and could be authored by multiple people. Complicating matters further, scribes were trained to mimic handwriting meticulously when copying manuscripts, making author identification of such documents even more difficult. In this context, this paper introduces a curated collection of Latin documents from the Genesis and Gospel of Matthew specifically gathered for the purpose of exploring the writer identification task. In particular, the dataset comprises over 400 pages, written by nine distinct persons. The primary objective is to explore the efficacy of state-of-the-art deep learning architectures in accurately ascribing historical texts to their rightful authors. To this end, this paper conducts extensive experiments, utilizing varying training set sizes and employing diverse pre-processing techniques to assess the performance and capabilities of these renowned models on the writer identification task while also providing the community with a baseline on the introduced collection.

Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark / Fagioli, A.; Avola, D.; Cinque, L.; Colombi, E.; Foresti, G. L.. - 14366:(2024), pp. 465-476. (Intervento presentato al convegno Workshops hosted by the 22nd International Conference on Image Analysis and Processing, ICIAP 2023 tenutosi a ita) [10.1007/978-3-031-51026-7_39].

Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark

Avola D.;Cinque L.;
2024

Abstract

Writer identification refers to the process of determining or attributing the authorship of a document to a specific individual through the analysis of various elements such as writing style, linguistic characteristics, and other textual features. This is a relevant task in heterogeneous fields such as cybersecurity, forensics, or linguistics and becomes particularly challenging when considering historical documents. In fact, the latter might present deterioration due to time, often lack signatures, and could be authored by multiple people. Complicating matters further, scribes were trained to mimic handwriting meticulously when copying manuscripts, making author identification of such documents even more difficult. In this context, this paper introduces a curated collection of Latin documents from the Genesis and Gospel of Matthew specifically gathered for the purpose of exploring the writer identification task. In particular, the dataset comprises over 400 pages, written by nine distinct persons. The primary objective is to explore the efficacy of state-of-the-art deep learning architectures in accurately ascribing historical texts to their rightful authors. To this end, this paper conducts extensive experiments, utilizing varying training set sizes and employing diverse pre-processing techniques to assess the performance and capabilities of these renowned models on the writer identification task while also providing the community with a baseline on the introduced collection.
2024
Workshops hosted by the 22nd International Conference on Image Analysis and Processing, ICIAP 2023
Benchmark; Deep Learning; Historical Handwritten Documents; Writer Identification
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark / Fagioli, A.; Avola, D.; Cinque, L.; Colombi, E.; Foresti, G. L.. - 14366:(2024), pp. 465-476. (Intervento presentato al convegno Workshops hosted by the 22nd International Conference on Image Analysis and Processing, ICIAP 2023 tenutosi a ita) [10.1007/978-3-031-51026-7_39].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1713468
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact