The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolu- tion stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided.

POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian / Favaro, Manuel; Biffi, Marco; Montemagni, Simonetta. - In: IJCOL. - ISSN 2499-4553. - 9:2(2023), pp. 99-120.

POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian

Manuel Favaro
Primo
Writing – Original Draft Preparation
;
2023

Abstract

The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolu- tion stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided.
2023
NLP; Old Italian; Historical Varieties
01 Pubblicazione su rivista::01a Articolo in rivista
POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian / Favaro, Manuel; Biffi, Marco; Montemagni, Simonetta. - In: IJCOL. - ISSN 2499-4553. - 9:2(2023), pp. 99-120.
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1713496
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact