Speech Emotion Recognition (SER) recognizes the emotional features of speech signals regardless of semantic content. Deep Learning techniques have proven superior to conventional techniques for emotion recognition due to advantages such as speed and scalability and infinitely versatile operation. However, since emotions are subjective, there is no universal agreement on evaluating or categorizing them. The main objective of this paper is to design a suitable model of Convolutional Neural Network (CNN) – Stride-based Convolutional Neural Network (SCNN) by taking a smaller number of convolutional layers and eliminate the pooling-layers to increase computational stability. This elimination tends to increase the accuracy and decrease the computational time of the SER system. Instead of pooling layers, deep strides have been used for the necessary dimension reduction. SCNN is trained on spectrograms generated from the speech signals of two different databases, Berlin (Emo-DB) and IITKGP-SEHSC. Four emotions, angry, happy, neutral, and sad, have been considered for the evaluation process, and a validation accuracy of 90.67% and 91.33% is achieved for Emo-DB and IITKGPSEHSC, respectively. This study provides new benchmarks for both datasets, demonstrating the feasibility and relevance of the presented SER technique.

Stride Based Convolutional Neural Network for Speech Emotion Recognition / Wani, T. M.; Gunawan, T. S.; Qadri, S. A. A.; Mansor, H.; Arifin, F.; Ahmad, Y. A.. - (2021), pp. 41-46. (Intervento presentato al convegno 2021 IEEE 7th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA) tenutosi a Bandung; Indonesia) [10.1109/ICSIMA50015.2021.9526320].

Stride Based Convolutional Neural Network for Speech Emotion Recognition

Wani T. M.
Primo
Writing – Original Draft Preparation
;
2021

Abstract

Speech Emotion Recognition (SER) recognizes the emotional features of speech signals regardless of semantic content. Deep Learning techniques have proven superior to conventional techniques for emotion recognition due to advantages such as speed and scalability and infinitely versatile operation. However, since emotions are subjective, there is no universal agreement on evaluating or categorizing them. The main objective of this paper is to design a suitable model of Convolutional Neural Network (CNN) – Stride-based Convolutional Neural Network (SCNN) by taking a smaller number of convolutional layers and eliminate the pooling-layers to increase computational stability. This elimination tends to increase the accuracy and decrease the computational time of the SER system. Instead of pooling layers, deep strides have been used for the necessary dimension reduction. SCNN is trained on spectrograms generated from the speech signals of two different databases, Berlin (Emo-DB) and IITKGP-SEHSC. Four emotions, angry, happy, neutral, and sad, have been considered for the evaluation process, and a validation accuracy of 90.67% and 91.33% is achieved for Emo-DB and IITKGPSEHSC, respectively. This study provides new benchmarks for both datasets, demonstrating the feasibility and relevance of the presented SER technique.
2021
2021 IEEE 7th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA)
Speech Emotion Recognition (SER); Stride-based Convolutional Neural Networks (SCNN); Strides, Spectrograms
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Stride Based Convolutional Neural Network for Speech Emotion Recognition / Wani, T. M.; Gunawan, T. S.; Qadri, S. A. A.; Mansor, H.; Arifin, F.; Ahmad, Y. A.. - (2021), pp. 41-46. (Intervento presentato al convegno 2021 IEEE 7th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA) tenutosi a Bandung; Indonesia) [10.1109/ICSIMA50015.2021.9526320].
File allegati a questo prodotto
File Dimensione Formato  
Wani_Stride_2021.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 980.49 kB
Formato Adobe PDF
980.49 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1714011
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? ND
social impact