Quaternion anti-transfer learning for speech emotion recognition

Guizzo, E.; Weyde, T.; Tarroni, G.; Comminiello, D.

doi:10.1109/WASPAA58266.2023.10248082

This study explores the benefits of anti-transfer learning with quaternion neural networks for robust, effective, and efficient speech emotion recognition. Anti-transfer learning selectively promotes task invariance through the introduction of a deep feature loss at training time. It has been shown to improve the performance of speech emotion recognition models by encouraging the independence of emotion predictions from specific uttered words and characteristics of the speaker's voice. However, the improved accuracy comes at a cost of increased computation time and memory requirements. In order to reduce the resource demand of anti-transfer, we propose to exploit quaternion-valued processing. We design, implement, and evaluate the use of quaternion anti-transfer learning on the basis of the VGG16 architecture and quaternion embeddings on multiple datasets for different speech emotion recognition task setups. The effectiveness of this approach depends on the layer where it is applied, with early layers offering a good compromise between performance gain and resource requirements. Our results show that anti-transfer in the quaternion domain can enhance generalisation while reducing the model's demand for computation and memory.

Quaternion anti-transfer learning for speech emotion recognition / Guizzo, E.; Weyde, T.; Tarroni, G.; Comminiello, D.. - (2023), pp. 1-5. (Intervento presentato al convegno 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023 tenutosi a New Paltz, NY; USA) [10.1109/WASPAA58266.2023.10248082].

Quaternion anti-transfer learning for speech emotion recognition

Guizzo E.^Primo;Weyde T.;Tarroni G.;Comminiello D.^Ultimo

2023

Abstract

This study explores the benefits of anti-transfer learning with quaternion neural networks for robust, effective, and efficient speech emotion recognition. Anti-transfer learning selectively promotes task invariance through the introduction of a deep feature loss at training time. It has been shown to improve the performance of speech emotion recognition models by encouraging the independence of emotion predictions from specific uttered words and characteristics of the speaker's voice. However, the improved accuracy comes at a cost of increased computation time and memory requirements. In order to reduce the resource demand of anti-transfer, we propose to exploit quaternion-valued processing. We design, implement, and evaluate the use of quaternion anti-transfer learning on the basis of the VGG16 architecture and quaternion embeddings on multiple datasets for different speech emotion recognition task setups. The effectiveness of this approach depends on the layer where it is applied, with early layers offering a good compromise between performance gain and resource requirements. Our results show that anti-transfer in the quaternion domain can enhance generalisation while reducing the model's demand for computation and memory.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Nome convegno
	
				2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023
			
	Parole chiave
	
				anti-transfer learning; quaternion neural networks; speech emotion recognition
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Quaternion anti-transfer learning for speech emotion recognition / Guizzo, E.; Weyde, T.; Tarroni, G.; Comminiello, D.. - (2023), pp. 1-5. (Intervento presentato al  convegno 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023 tenutosi a New Paltz, NY; USA) [10.1109/WASPAA58266.2023.10248082].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Guizzo_Quaternion Anti-Transfer_2023.pdf solo gestori archivio Note: Articolo Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 959.46 kB Formato Adobe PDF Contatta l'autore	959.46 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1693482

Citazioni

ND

0

0

Catalogo dei prodotti della ricerca