Catalogo dei prodotti della ricerca

In this paper we describe some experiments related to a corpus derived from an authoritative historical Italian dictionary, namely the Grande dizionario della lingua italiana (‘Great Dictionary of Italian Language’, in short GDLI). Thanks to the digitization and structuring of this dictionary, we have been able to set up the first nucleus of a diachronic annotated corpus that selects—according to specific criteria, and distinguishing between prose and poetry—some of the quotations that within the entries illustrate the different definitions and sub-definitions. In fact, the GDLI presents a huge collection of quotations covering the entire history of the Italian language and thus ranging from the Middle Ages to the present day. The corpus was enriched with linguistic annotation and used to train and evaluate NLP models for POS tagging and lemmatization, with promising results.

Towards the Creation of a Diachronic Corpus for Italian: A Case Study on the GDLI Quotations / Favaro, Manuel; Guadagnini, Elisa; Sassolini, Eva; Biffi, Marco; Montemagni, Simonetta. - (2022), pp. 94-100. ( LREC 2022 Marsiglia ).

Towards the Creation of a Diachronic Corpus for Italian: A Case Study on the GDLI Quotations

Manuel Favaro^{Primo

Writing – Original Draft Preparation};Elisa Guadagnini^{Writing – Original Draft Preparation};

2022

Abstract

In this paper we describe some experiments related to a corpus derived from an authoritative historical Italian dictionary, namely the Grande dizionario della lingua italiana (‘Great Dictionary of Italian Language’, in short GDLI). Thanks to the digitization and structuring of this dictionary, we have been able to set up the first nucleus of a diachronic annotated corpus that selects—according to specific criteria, and distinguishing between prose and poetry—some of the quotations that within the entries illustrate the different definitions and sub-definitions. In fact, the GDLI presents a huge collection of quotations covering the entire history of the Italian language and thus ranging from the Middle Ages to the present day. The corpus was enriched with linguistic annotation and used to train and evaluate NLP models for POS tagging and lemmatization, with promising results.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Nome convegno
	
				LREC 2022
			
	Parole chiave
	
				Diachronic Corpus; Adaptation of Annotation Tools; Historical Dictionaries
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Towards the Creation of a Diachronic Corpus for Italian: A Case Study on the GDLI Quotations / Favaro, Manuel; Guadagnini, Elisa; Sassolini, Eva; Biffi, Marco; Montemagni, Simonetta. - (2022), pp. 94-100. ( LREC 2022 Marsiglia ).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Favaro_Towards-the-Creation_2022.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 517.58 kB Formato Adobe PDF	517.58 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1713458

Citazioni

ND

ND

ND

social impact