Human feelings expressed through verbal (e.g. voice) and non-verbal communication channels (e.g. face or body) can influence either human actions or interactions. In the literature, most of the attention was given to facial expressions for the analysis of emotions conveyed through non-verbal behaviors. Despite this, psychology highlights that the body is an important indicator of the human affective state in performing daily life activities. Therefore, this paper presents a novel method for affective action and interaction recognition from videos, exploiting multi-view representation learning and only full-body handcrafted characteristics selected following psychological and proxemic studies. Specifically, 2D skeletal data are extracted from RGB video sequences to derive diverse low-level skeleton features, i.e. multi-views, modeled through the bag-of-visual-words clustering approach generating a condition-related codebook. In this way, each affective action and interaction within a video can be represented as a frequency histogram of codewords. During the learning phase, for each affective class, training samples are used to compute its global histogram of codewords stored in a database and later used for the recognition task. In the recognition phase, the video frequency histogram representation is matched against the database of class histograms and classified as the closest affective class in terms of Euclidean distance. The effectiveness of the proposed system is evaluated on a specifically collected dataset containing 6 emotion for both actions and interactions, on which the proposed system obtains 93.64% and 90.83% accuracy, respectively. In addition, the devised strategy also achieves in line performances with other literature works based on deep learning when tested on a public collection containing 6 emotions plus a neutral state, demonstrating the effectiveness of the presented approach and confirming the findings in psychological and proxemic studies.

Affective Action and Interaction Recognition by Multi-View Representation Learning from Handcrafted Low-Level Skeleton Features / Avola, D.; Cascio, M.; Cinque, L.; Fagioli, A.; Foresti, G. L.. - In: INTERNATIONAL JOURNAL OF NEURAL SYSTEMS. - ISSN 0129-0657. - 32:10(2022), p. 2250040. [10.1142/S012906572250040X]

Affective Action and Interaction Recognition by Multi-View Representation Learning from Handcrafted Low-Level Skeleton Features

Avola D.
Primo
;
Cascio M.;Cinque L.;Fagioli A.;Foresti G. L.
2022

Abstract

Human feelings expressed through verbal (e.g. voice) and non-verbal communication channels (e.g. face or body) can influence either human actions or interactions. In the literature, most of the attention was given to facial expressions for the analysis of emotions conveyed through non-verbal behaviors. Despite this, psychology highlights that the body is an important indicator of the human affective state in performing daily life activities. Therefore, this paper presents a novel method for affective action and interaction recognition from videos, exploiting multi-view representation learning and only full-body handcrafted characteristics selected following psychological and proxemic studies. Specifically, 2D skeletal data are extracted from RGB video sequences to derive diverse low-level skeleton features, i.e. multi-views, modeled through the bag-of-visual-words clustering approach generating a condition-related codebook. In this way, each affective action and interaction within a video can be represented as a frequency histogram of codewords. During the learning phase, for each affective class, training samples are used to compute its global histogram of codewords stored in a database and later used for the recognition task. In the recognition phase, the video frequency histogram representation is matched against the database of class histograms and classified as the closest affective class in terms of Euclidean distance. The effectiveness of the proposed system is evaluated on a specifically collected dataset containing 6 emotion for both actions and interactions, on which the proposed system obtains 93.64% and 90.83% accuracy, respectively. In addition, the devised strategy also achieves in line performances with other literature works based on deep learning when tested on a public collection containing 6 emotions plus a neutral state, demonstrating the effectiveness of the presented approach and confirming the findings in psychological and proxemic studies.
2022
Affective action; affective interaction; bag-of-visual-words; handcrafted low-level skeleton features; multi-view representation learning
01 Pubblicazione su rivista::01a Articolo in rivista
Affective Action and Interaction Recognition by Multi-View Representation Learning from Handcrafted Low-Level Skeleton Features / Avola, D.; Cascio, M.; Cinque, L.; Fagioli, A.; Foresti, G. L.. - In: INTERNATIONAL JOURNAL OF NEURAL SYSTEMS. - ISSN 0129-0657. - 32:10(2022), p. 2250040. [10.1142/S012906572250040X]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1665272
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 8
social impact