The purpose of this project was to derive a reliable estimate of the frequency of occurrence of the 30 phonemes – plus consonant geminated counterparts- of the Italian language, based on four selected written texts. Since no comparable dataset was found in previous literature, the present analysis may serve as a reference in future studies. Four textual sources were considered: Come si fa una tesi di laurea: le materie umanistiche by Umberto Eco, I promessi sposi by Alessandro Manzoni, a recent article in Corriere della Sera (a popular daily Italian newspaper), and In altre parole by Jhumpa Lahiri. The sources were chosen to represent varied genres, subject matter, time periods, and writing styles. Results of the analysis, which also included an analysis of variance, showed that, for all four sources, the frequencies of occurrence reached relatively stable values after about 6000 phonemes (approximately 1250 words), varying by <0.025%. Estimated frequencies are provided for each single source and as an average across sources.
Estimation of the frequency of occurence of italian phonemes in text / Arango, Javier; Decaprio, Alec; Yao, Stephanie; Baik, Sunwoo; Shattuck-Hufnagel, Stefanie; Di Benedetto, Maria-Gabriella. - In: THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA. - ISSN 0001-4966. - 148:4(2020), pp. 2809-2809. [10.1121/1.5147826]
Estimation of the frequency of occurence of italian phonemes in text
Di Benedetto, Maria-Gabriella
2020
Abstract
The purpose of this project was to derive a reliable estimate of the frequency of occurrence of the 30 phonemes – plus consonant geminated counterparts- of the Italian language, based on four selected written texts. Since no comparable dataset was found in previous literature, the present analysis may serve as a reference in future studies. Four textual sources were considered: Come si fa una tesi di laurea: le materie umanistiche by Umberto Eco, I promessi sposi by Alessandro Manzoni, a recent article in Corriere della Sera (a popular daily Italian newspaper), and In altre parole by Jhumpa Lahiri. The sources were chosen to represent varied genres, subject matter, time periods, and writing styles. Results of the analysis, which also included an analysis of variance, showed that, for all four sources, the frequencies of occurrence reached relatively stable values after about 6000 phonemes (approximately 1250 words), varying by <0.025%. Estimated frequencies are provided for each single source and as an average across sources.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.