Catalogo dei prodotti della ricerca

Visual Sentiment Analysis aims to understand how images affect people in terms of evoked emotions. This paper presents a complete pipeline for comparing users’ emotional responses to images, enabling the analysis of potential discrepancies between machine-inferred and subjective affective states. The proposed framework consists of three main stages. The first stage employs a Convolutional Neural Network (CNN) enhanced with Feature Pyramid Network (FPN) layers to extract multi-scale visual features. Experimental results show that incorporating three additional FPN layers improves performance while introducing only a negligible increase in model complexity. In the second stage, a multimodal approach is adopted, where visual features are integrated with textual features derived from captions generated by an Image Captioning model. This fusion enriches the emotional context by combining visual and linguistic cues. In the final stage, a grounding mechanism is applied to align and merge sentiments from the different modalities into a unified representation. The algorithm’s output is then compared with the sentiment expressed by the user, enabling an analysis of the divergence between machine-inferred and human-perceived emotions.

A Multimodal Visual Sentiment Analysis Framework Enhanced With Feature Pyramid Networks / Galletti, D.; Ponzi, V.; Russo, S.. - 3984:(2025), pp. 55-63. ( 10th International Conference of Yearly Reports on Informatics, Mathematics, and Engineering, ICYRIME 2025 pol ).

A Multimodal Visual Sentiment Analysis Framework Enhanced With Feature Pyramid Networks

Ponzi V.^{Secondo

Methodology};Russo S.^{Ultimo

Supervision}

2025

Abstract

Visual Sentiment Analysis aims to understand how images affect people in terms of evoked emotions. This paper presents a complete pipeline for comparing users’ emotional responses to images, enabling the analysis of potential discrepancies between machine-inferred and subjective affective states. The proposed framework consists of three main stages. The first stage employs a Convolutional Neural Network (CNN) enhanced with Feature Pyramid Network (FPN) layers to extract multi-scale visual features. Experimental results show that incorporating three additional FPN layers improves performance while introducing only a negligible increase in model complexity. In the second stage, a multimodal approach is adopted, where visual features are integrated with textual features derived from captions generated by an Image Captioning model. This fusion enriches the emotional context by combining visual and linguistic cues. In the final stage, a grounding mechanism is applied to align and merge sentiments from the different modalities into a unified representation. The algorithm’s output is then compared with the sentiment expressed by the user, enabling an analysis of the divergence between machine-inferred and human-perceived emotions.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				10th International Conference of Yearly Reports on Informatics, Mathematics, and Engineering, ICYRIME 2025
			
	Parole chiave
	
				Feature Pyramid Network; Multimodal Evaluation; Visual Sentiment Analysis
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				A Multimodal Visual Sentiment Analysis Framework Enhanced With Feature Pyramid Networks / Galletti, D.; Ponzi, V.; Russo, S.. - 3984:(2025), pp. 55-63. ( 10th International Conference of Yearly Reports on Informatics, Mathematics, and Engineering, ICYRIME 2025 pol ).

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1743424

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

social impact