Moving to continuous classifications of bilingualism through machine learning trained on language production

Coco, M. I.; Smith, G.; Spelorzi, R.; Garraffa, M.

doi:10.1017/s1366728924000361

Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to ("monolingual", "attriters"and "heritage"). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.

Moving to continuous classifications of bilingualism through machine learning trained on language production / Coco, M.I., Smith, G., Spelorzi, R., Garraffa, M.. - In: BILINGUALISM. - ISSN 1469-1841. - (2024), pp. 1-9. [10.1017/s1366728924000361]

Moving to continuous classifications of bilingualism through machine learning trained on language production

Coco, M. I.^Primo;Smith, G.;Spelorzi, R.;Garraffa, M.

2024

Abstract

Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to ("monolingual", "attriters"and "heritage"). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Parole chiave
	
				attrition; bilingualism; classification; heritage speakers; support vector machine
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Moving to continuous classifications of bilingualism through machine learning trained on language production / Coco, M.I., Smith, G., Spelorzi, R., Garraffa, M.. - In: BILINGUALISM. - ISSN 1469-1841. - (2024), pp. 1-9. [10.1017/s1366728924000361]

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1731710

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

2

Catalogo dei prodotti della ricerca