Crowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots. A critical question concerning the innovative concept of crowd emotions is whether the emotional content of crowd sounds can be characterized by frequency-amplitude features, using analysis techniques similar to those applied on individual voices, where deep learning classification is applied to spectrogram images derived by sound transformations. In this work, we present a technique based on the generation of sound spectrograms from fragments of fixed length, extracted from original audio clips recorded in high-attendance events, where the crowd acts as a collective individual. Transfer learning techniques are used on a convolutional neural network, pre-trained on low-level features using the well-known ImageNet extensive dataset of visual knowledge. The original sound clips are filtered and normalized in amplitude for a correct spectrogram generation, on which we fine-tune the domain-specific features. Experiments held on the finally trained Convolutional Neural Network show promising performances of the proposed model to classify the emotions of the crowd.

Emotional sounds of crowds: spectrogram-based analysis using deep learning / Franzoni, V.; Biondi, G.; Milani, A.. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1380-7501. - 79:47-48(2020), pp. 36063-36075. [10.1007/s11042-020-09428-x]

Emotional sounds of crowds: spectrogram-based analysis using deep learning

Franzoni V.
;
2020

Abstract

Crowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots. A critical question concerning the innovative concept of crowd emotions is whether the emotional content of crowd sounds can be characterized by frequency-amplitude features, using analysis techniques similar to those applied on individual voices, where deep learning classification is applied to spectrogram images derived by sound transformations. In this work, we present a technique based on the generation of sound spectrograms from fragments of fixed length, extracted from original audio clips recorded in high-attendance events, where the crowd acts as a collective individual. Transfer learning techniques are used on a convolutional neural network, pre-trained on low-level features using the well-known ImageNet extensive dataset of visual knowledge. The original sound clips are filtered and normalized in amplitude for a correct spectrogram generation, on which we fine-tune the domain-specific features. Experiments held on the finally trained Convolutional Neural Network show promising performances of the proposed model to classify the emotions of the crowd.
2020
CNN; Crowd computing; Crowd emotions; Emotion recognition; Image recognition; Transfer learning
01 Pubblicazione su rivista::01a Articolo in rivista
Emotional sounds of crowds: spectrogram-based analysis using deep learning / Franzoni, V.; Biondi, G.; Milani, A.. - In: MULTIMEDIA TOOLS AND APPLICATIONS. - ISSN 1380-7501. - 79:47-48(2020), pp. 36063-36075. [10.1007/s11042-020-09428-x]
File allegati a questo prodotto
File Dimensione Formato  
Franzoni_Emotional-sounds_2020.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 840.5 kB
Formato Adobe PDF
840.5 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1734333
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 41
  • ???jsp.display-item.citation.isi??? 29
social impact