Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to ("monolingual", "attriters"and "heritage"). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.

Moving to continuous classifications of bilingualism through machine learning trained on language production / Coco, M. I.; Smith, G.; Spelorzi, R.; Garraffa, M.. - In: BILINGUALISM. - ISSN 1469-1841. - (2024), pp. 1-9. [10.1017/s1366728924000361]

Moving to continuous classifications of bilingualism through machine learning trained on language production

Coco, M. I.
Primo
;
2024

Abstract

Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to ("monolingual", "attriters"and "heritage"). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.
2024
attrition; bilingualism; classification; heritage speakers; support vector machine
01 Pubblicazione su rivista::01a Articolo in rivista
Moving to continuous classifications of bilingualism through machine learning trained on language production / Coco, M. I.; Smith, G.; Spelorzi, R.; Garraffa, M.. - In: BILINGUALISM. - ISSN 1469-1841. - (2024), pp. 1-9. [10.1017/s1366728924000361]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1731710
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact