Catalogo dei prodotti della ricerca

Automatic transcription of historical handwritten documents is a challenging research problem, requiring in general expensive transcriptions from expert paleographers. In Codice Ratio is designed to be an end-to-end architecture requiring instead limited labeling effort, whose aim is the automatic transcription of a portion of the Vatican Secret Archives (one of the largest historical libraries in the world). In this paper, we describe in particular the design of our OCR component for Latin characters. To this end, we first annotated a large corpus of Latin characters with a custom crowdsourcing platform. Leveraging over recent progresses in deep learning, we designed and trained a deep convolutional network achieving an overall accuracy of 96% over the entire dataset, which is one of the highest results reported in the literature so far. Our training data are publicly available.

In codice ratio: OCR of handwritten Latin documents using deep convolutional networks / Firmani, D.; Merialdo, P.; Nieddu, E.; Scardapane, S.. - 2034:(2017), pp. 9-16. (Intervento presentato al convegno 11th International Workshop on Artificial Intelligence for Cultural Heritage, AI*CH 2017 tenutosi a Bari; Italy).

In codice ratio: OCR of handwritten Latin documents using deep convolutional networks

Firmani D.;Merialdo P.;Nieddu E.;Scardapane S.

2017

Abstract

Automatic transcription of historical handwritten documents is a challenging research problem, requiring in general expensive transcriptions from expert paleographers. In Codice Ratio is designed to be an end-to-end architecture requiring instead limited labeling effort, whose aim is the automatic transcription of a portion of the Vatican Secret Archives (one of the largest historical libraries in the world). In this paper, we describe in particular the design of our OCR component for Latin characters. To this end, we first annotated a large corpus of Latin characters with a custom crowdsourcing platform. Leveraging over recent progresses in deep learning, we designed and trained a deep convolutional network achieving an overall accuracy of 96% over the entire dataset, which is one of the highest results reported in the literature so far. Our training data are publicly available.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2017
			
	Nome convegno
	
				11th International Workshop on Artificial Intelligence for Cultural Heritage, AI*CH 2017
			
	Parole chiave
	
				Deep convolutional neural networks; Handwritten text recognition; Medieval documents; Optical character recognition
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				In codice ratio: OCR of handwritten Latin documents using deep convolutional networks / Firmani, D.; Merialdo, P.; Nieddu, E.; Scardapane, S.. - 2034:(2017), pp. 9-16. (Intervento presentato al  convegno 11th International Workshop on Artificial Intelligence for Cultural Heritage, AI*CH 2017 tenutosi a Bari; Italy).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Firmani_In-codice-ratio_2017.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 2.88 MB Formato Adobe PDF	2.88 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1335723

Citazioni

ND

13

ND

social impact