An assortment of techniques has been presented in the area of Speech Emotion Recognition (SER), where the main focus is to recognize the silent discriminants and useful features of speech signals. These features undergo the process of classification to recognize the specific emotion of a speaker. In recent times, deep learning techniques have emerged as a breakthrough in speech emotion recognition to detect and classify emotions. In this paper, we have modified a recently developed different network architecture of convolutional neural networks, i.e., Deep Stride Convolutional Neural Networks (DSCNN), by taking a smaller number of convolutional layers to increase the computational speed while still maintaining accuracy. Besides, we trained the state-of-art model of CNN and proposed DSCNN on spectrograms generated from the SAVEE speech emotion dataset. For the evaluation process, four emotions angry, happy, neutral, and sad, were considered. Evaluation results show that the proposed architecture DSCNN, with the prediction accuracy of 87.8%, outperforms CNN with 79.4% accuracy.
Speech emotion recognition using convolution neural networks and deep stride convolutional neural networks / Wani, T. M.; Gunawan, T. S.; Qadri, S. A. A.; Mansor, H.; Kartiwi, M.; Ismail, N.. - (2020). (Intervento presentato al convegno 6th International Conference on Wireless and Telematics, ICWT 2020 tenutosi a Yogyakarta) [10.1109/ICWT50448.2020.9243622].
Speech emotion recognition using convolution neural networks and deep stride convolutional neural networks
Wani T. M.
Primo
Writing – Original Draft Preparation
;
2020
Abstract
An assortment of techniques has been presented in the area of Speech Emotion Recognition (SER), where the main focus is to recognize the silent discriminants and useful features of speech signals. These features undergo the process of classification to recognize the specific emotion of a speaker. In recent times, deep learning techniques have emerged as a breakthrough in speech emotion recognition to detect and classify emotions. In this paper, we have modified a recently developed different network architecture of convolutional neural networks, i.e., Deep Stride Convolutional Neural Networks (DSCNN), by taking a smaller number of convolutional layers to increase the computational speed while still maintaining accuracy. Besides, we trained the state-of-art model of CNN and proposed DSCNN on spectrograms generated from the SAVEE speech emotion dataset. For the evaluation process, four emotions angry, happy, neutral, and sad, were considered. Evaluation results show that the proposed architecture DSCNN, with the prediction accuracy of 87.8%, outperforms CNN with 79.4% accuracy.File | Dimensione | Formato | |
---|---|---|---|
Wani_Speech-Emotion_2020.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
254.35 kB
Formato
Adobe PDF
|
254.35 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.