Accurate and objective pain assessment is crucial for effective pain management. This paper proposes a novel multimodal deep learning framework for automatic pain detection using a hybrid architecture with feature-level fusion. The framework leverages multimodal data including facial expressions and physiological signals (EDA and ECG) from the BioVid Heat Pain database (Part A). The novel hybrid architecture consists of two streams as stream 1 employs an attention-based CNN-LSTM to extract features from facial expressions videos, capture temporal dependencies, and focus on relevant aspects of the video data, and stream 2 with an LSTM to capture temporal patterns in the physiological signals. The performance of the proposed model was examined in both unimodal and multimodal settings. In a binary classification task distinguishing No Pain from Severe Pain, electrodermal activity (EDA) outperformed all other single data sources, achieving high average accuracy (83.05% for 67 subjects and 82.69% for 87 subjects) and F1-scores (81.66 and 80.18, respectively) using k-fold cross-validation. Additionally, the multimodal setting (Video + EDA) achieved higher accuracy (84.15% for 67 subjects and 83.35% for 87 subjects) and F1-scores (82.86 and 82.36, respectively).
Multimodal Automatic Acute Pain Recognition Using Facial Expressions and Physiological Signals / Farmani, J.; Giuseppi, A.; Bargshady, G.; Fernandez Rojas, R.. - 2286:(2025), pp. 49-62. ( 31st International Conference on Neural Information Processing, ICONIP 2024 Auckland; New Zeland ) [10.1007/978-981-96-6960-8_4].
Multimodal Automatic Acute Pain Recognition Using Facial Expressions and Physiological Signals
Giuseppi A.;
2025
Abstract
Accurate and objective pain assessment is crucial for effective pain management. This paper proposes a novel multimodal deep learning framework for automatic pain detection using a hybrid architecture with feature-level fusion. The framework leverages multimodal data including facial expressions and physiological signals (EDA and ECG) from the BioVid Heat Pain database (Part A). The novel hybrid architecture consists of two streams as stream 1 employs an attention-based CNN-LSTM to extract features from facial expressions videos, capture temporal dependencies, and focus on relevant aspects of the video data, and stream 2 with an LSTM to capture temporal patterns in the physiological signals. The performance of the proposed model was examined in both unimodal and multimodal settings. In a binary classification task distinguishing No Pain from Severe Pain, electrodermal activity (EDA) outperformed all other single data sources, achieving high average accuracy (83.05% for 67 subjects and 82.69% for 87 subjects) and F1-scores (81.66 and 80.18, respectively) using k-fold cross-validation. Additionally, the multimodal setting (Video + EDA) achieved higher accuracy (84.15% for 67 subjects and 83.35% for 87 subjects) and F1-scores (82.86 and 82.36, respectively).| File | Dimensione | Formato | |
|---|---|---|---|
|
Farmani_Multimodal_preprint_2025.pdf
accesso aperto
Note: https://doi.org/10.1007/978-981-96-6960-8_4
Tipologia:
Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.36 MB
Formato
Adobe PDF
|
1.36 MB | Adobe PDF | |
|
Farmani_Multimodal_2025.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
2.08 MB
Formato
Adobe PDF
|
2.08 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


