The purpose of this project was to derive a reliable estimate of the frequency of occurrence of the 30 phonemes – plus consonant geminated counterparts- of the Italian language, based on four selected written texts. Since no comparable dataset was found in previous literature, the present analysis may serve as a reference in future studies. Four textual sources were considered: Come si fa una tesi di laurea: le materie umanistiche by Umberto Eco, I promessi sposi by Alessandro Manzoni, a recent article in Corriere della Sera (a popular daily Italian newspaper), and In altre parole by Jhumpa Lahiri. The sources were chosen to represent varied genres, subject matter, time periods, and writing styles. Results of the analysis, which also included an analysis of variance, showed that, for all four sources, the frequencies of occurrence reached relatively stable values after about 6,000 phonemes (approx.1,250 words), varying by <0.025%. Estimated frequencies are provided for each single source and as an average across sources.

Estimation of the Frequency of Occurrence of Italian Phonemes in Text / Arango, Javi; Decaprio, Alec; Baik, Sunwoo; DE NARDIS, Luca; Shattuck-Hufnagel, Stefanie; DI BENEDETTO, Maria Gabriella. - (2021). [10.48550/arxiv.2101.06147]

Estimation of the Frequency of Occurrence of Italian Phonemes in Text

Luca De Nardis;Maria Gabriella Di Benedetto
2021

Abstract

The purpose of this project was to derive a reliable estimate of the frequency of occurrence of the 30 phonemes – plus consonant geminated counterparts- of the Italian language, based on four selected written texts. Since no comparable dataset was found in previous literature, the present analysis may serve as a reference in future studies. Four textual sources were considered: Come si fa una tesi di laurea: le materie umanistiche by Umberto Eco, I promessi sposi by Alessandro Manzoni, a recent article in Corriere della Sera (a popular daily Italian newspaper), and In altre parole by Jhumpa Lahiri. The sources were chosen to represent varied genres, subject matter, time periods, and writing styles. Results of the analysis, which also included an analysis of variance, showed that, for all four sources, the frequencies of occurrence reached relatively stable values after about 6,000 phonemes (approx.1,250 words), varying by <0.025%. Estimated frequencies are provided for each single source and as an average across sources.
2021
Italian; phoneme frequency
03 Monografia::03a Saggio, Trattato Scientifico
Estimation of the Frequency of Occurrence of Italian Phonemes in Text / Arango, Javi; Decaprio, Alec; Baik, Sunwoo; DE NARDIS, Luca; Shattuck-Hufnagel, Stefanie; DI BENEDETTO, Maria Gabriella. - (2021). [10.48550/arxiv.2101.06147]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1640086
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact