This paper aims ot show some of hte resources developed within the TrAVaSI project (Automatic Treatment of Italian Historical Varieties). In particular, we illustrate corpora and morphological lexicons built to improve the performance of the lemmatization process, which still represents a challenge when dealing with historical varieties of language. Results achieved by extending hte dictionary and by training the lemmatization model with historical corpora are compared. A preliminary analysis of lemmatization error typologies is also reported.
Trattamento automatico del linguaggio e varietà storiche di italiano: la sfida della lemmatizzazione / Favaro, Manuel; Biffi, Marco; Montemagni, Simonetta. - (2022), pp. 392-399. (Intervento presentato al convegno JADT 2022 tenutosi a Napoli).
Trattamento automatico del linguaggio e varietà storiche di italiano: la sfida della lemmatizzazione
Manuel FavaroPrimo
;
2022
Abstract
This paper aims ot show some of hte resources developed within the TrAVaSI project (Automatic Treatment of Italian Historical Varieties). In particular, we illustrate corpora and morphological lexicons built to improve the performance of the lemmatization process, which still represents a challenge when dealing with historical varieties of language. Results achieved by extending hte dictionary and by training the lemmatization model with historical corpora are compared. A preliminary analysis of lemmatization error typologies is also reported.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.