This article introduces the PEC24, an extension of the Perugia corpus, as a new reference corpus for Italian. The update mainly concerned the size of the corpus, which now consists of approximately 47 million tokens, with an addition of over 100,000 texts. The PEC24 maintains the same structure as its predecessor, divided into 10 sections, representing ten different written and spoken genres. In this article, after reviewing the spoken, written, and web corpora available for the Italian language, the internal composition of each section of the corpus will be described, followed by an explanation of how the corpus was annotated. Further, as the PEC24 is available and searchable online, examples of how it can be queried will be illustrated. In conclusion, the PEC24 represents a significant advancement in the panorama of Italian corpora, offering a representative and more comprehensive resource for linguistic research and corpus-bases studies.

From PEC to PEC24: a new reference corpus for Italian / Spina, S; Zanda, F; Fioravanti, I. - In: ITALIANO LINGUADUE. - ISSN 2037-3597. - 17:1(2025), pp. 745-768. [10.54103/2037-3597/29101]

From PEC to PEC24: a new reference corpus for Italian

Spina S;Zanda F;
2025

Abstract

This article introduces the PEC24, an extension of the Perugia corpus, as a new reference corpus for Italian. The update mainly concerned the size of the corpus, which now consists of approximately 47 million tokens, with an addition of over 100,000 texts. The PEC24 maintains the same structure as its predecessor, divided into 10 sections, representing ten different written and spoken genres. In this article, after reviewing the spoken, written, and web corpora available for the Italian language, the internal composition of each section of the corpus will be described, followed by an explanation of how the corpus was annotated. Further, as the PEC24 is available and searchable online, examples of how it can be queried will be illustrated. In conclusion, the PEC24 represents a significant advancement in the panorama of Italian corpora, offering a representative and more comprehensive resource for linguistic research and corpus-bases studies.
2025
corpora; Italian; reference corpus; written Italian; spiken Italian; corpus linguistics
01 Pubblicazione su rivista::01a Articolo in rivista
From PEC to PEC24: a new reference corpus for Italian / Spina, S; Zanda, F; Fioravanti, I. - In: ITALIANO LINGUADUE. - ISSN 2037-3597. - 17:1(2025), pp. 745-768. [10.54103/2037-3597/29101]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1758260
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact