Catalogo dei prodotti della ricerca

This article introduces the PEC24, an extension of the Perugia corpus, as a new reference corpus for Italian. The update mainly concerned the size of the corpus, which now consists of approximately 47 million tokens, with an addition of over 100,000 texts. The PEC24 maintains the same structure as its predecessor, divided into 10 sections, representing ten different written and spoken genres. In this article, after reviewing the spoken, written, and web corpora available for the Italian language, the internal composition of each section of the corpus will be described, followed by an explanation of how the corpus was annotated. Further, as the PEC24 is available and searchable online, examples of how it can be queried will be illustrated. In conclusion, the PEC24 represents a significant advancement in the panorama of Italian corpora, offering a representative and more comprehensive resource for linguistic research and corpus-bases studies.

From PEC to PEC24: a new reference corpus for Italian / Spina, S; Zanda, F; Fioravanti, I. - In: ITALIANO LINGUADUE. - ISSN 2037-3597. - 17:1(2025), pp. 745-768. [10.54103/2037-3597/29101]

From PEC to PEC24: a new reference corpus for Italian

Spina S;Zanda F;Fioravanti I

2025

Abstract

This article introduces the PEC24, an extension of the Perugia corpus, as a new reference corpus for Italian. The update mainly concerned the size of the corpus, which now consists of approximately 47 million tokens, with an addition of over 100,000 texts. The PEC24 maintains the same structure as its predecessor, divided into 10 sections, representing ten different written and spoken genres. In this article, after reviewing the spoken, written, and web corpora available for the Italian language, the internal composition of each section of the corpus will be described, followed by an explanation of how the corpus was annotated. Further, as the PEC24 is available and searchable online, examples of how it can be queried will be illustrated. In conclusion, the PEC24 represents a significant advancement in the panorama of Italian corpora, offering a representative and more comprehensive resource for linguistic research and corpus-bases studies.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Parole chiave
	
				corpora; Italian; reference corpus; written Italian; spiken Italian; corpus linguistics
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				From PEC to PEC24: a new reference corpus for Italian / Spina, S; Zanda, F; Fioravanti, I. - In: ITALIANO LINGUADUE. - ISSN 2037-3597. - 17:1(2025), pp. 745-768. [10.54103/2037-3597/29101]

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1758260

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact