This article introduces the CELI corpus, a new learner corpus of written Italian consisting of ca. 600,000 tokens, evenly distributed among CEFR (Common European Framework of Reference for Languages) proficiency levels B1, B2, C1 and C2. The collected texts derive from the language certification exams administered by the University for Foreigners of Perugia all around the world. The corpus contains rich metadata pertaining to text-related and learner-related variables. It expands the domain of learner corpora by being, among other things, both freely available online to the research community, and by focusing on a target language other than English. The article also presents and evaluates the POS-tagging procedure, thus contributing to best practices in learner corpus annotation.

The CELI corpus: Design and linguistic annotation of a new online learner corpus / Spina, Stefania; Fioravanti, Irene; Forti, Luciana; Zanda, Fabio. - In: SECOND LANGUAGE RESEARCH. - ISSN 0267-6583. - (2023). [10.1177/02676583231176370]

The CELI corpus: Design and linguistic annotation of a new online learner corpus

Stefania Spina;Fabio Zanda
2023

Abstract

This article introduces the CELI corpus, a new learner corpus of written Italian consisting of ca. 600,000 tokens, evenly distributed among CEFR (Common European Framework of Reference for Languages) proficiency levels B1, B2, C1 and C2. The collected texts derive from the language certification exams administered by the University for Foreigners of Perugia all around the world. The corpus contains rich metadata pertaining to text-related and learner-related variables. It expands the domain of learner corpora by being, among other things, both freely available online to the research community, and by focusing on a target language other than English. The article also presents and evaluates the POS-tagging procedure, thus contributing to best practices in learner corpus annotation.
2023
Italian; learner corpora; annotation
01 Pubblicazione su rivista::01a Articolo in rivista
The CELI corpus: Design and linguistic annotation of a new online learner corpus / Spina, Stefania; Fioravanti, Irene; Forti, Luciana; Zanda, Fabio. - In: SECOND LANGUAGE RESEARCH. - ISSN 0267-6583. - (2023). [10.1177/02676583231176370]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1758259
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact