Large-scale Benchmarks for Multimodal Recommendation with Ducho

Attimonelli, Matteo; Danese, Danilo; Angela Di Fazio,; Malitesta, Daniele; Pomo, Claudio; Tommaso Di Noia,

doi:10.1016/j.eswa.2025.130813

With the advent of deep learning and, more recently, large models, recommendation systems have greatly refined their capability of profiling users’ preferences and interests that, in most cases, are complex to disentangle. This is especially true for those recommendation algorithms that rely heavily on external side information, such as multimodal recommender systems. In specific domains like fashion, music, and movie recommendation, the multi-faceted features characterizing products and services may influence each customer on online platforms differently, paving the way to novel multimodal recommendation models that can learn from such multimodal content. According to the literature, the common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, (iii) optionally fusing all multimodal features, and (iv) predicting the user-item score. Although great effort has been put into designing optimal solutions for (ii-iv), to the best of our knowledge, very little attention has been devoted to exploring procedures for (i) in a rigorous way. In this respect, the existing literature outlines the large availability of multimodal datasets and the ever-growing number of large models accounting for multimodal-aware tasks, but (at the same time) an unjustified adoption of limited standardized solutions. As very recent works from the literature have begun to conduct empirical studies to assess the contribution of multimodality in recommendation, we decide to follow and complement this same research direction. To this end, this paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors. Specifically, we take advantage of three popular and recent frameworks for multimodal feature extraction and reproducibility in recommendation, Ducho, and MMRec/Elliot, respectively, to offer a unified and ready-to-use experimental environment able to run extensive benchmarking analyses leveraging novel multimodal feature extractors. Results, largely validated under different extractors, hyper-parameters of the extractors, domains, and modalities, provide important insights on how to train and tune the next generation of multimodal recommendation algorithms.

Large-scale Benchmarks for Multimodal Recommendation with Ducho / Attimonelli, Matteo; Danese, Danilo; Di Fazio, Angela; Malitesta, Daniele; Pomo, Claudio; Di Noia, Tommaso. - In: EXPERT SYSTEMS WITH APPLICATIONS. - ISSN 0957-4174. - (2025). [10.1016/j.eswa.2025.130813]

Large-scale Benchmarks for Multimodal Recommendation with Ducho

Matteo Attimonelli^Primo;Danilo Danese;Angela Di Fazio;Daniele Malitesta;Claudio Pomo;Tommaso Di Noia

2025

Abstract

With the advent of deep learning and, more recently, large models, recommendation systems have greatly refined their capability of profiling users’ preferences and interests that, in most cases, are complex to disentangle. This is especially true for those recommendation algorithms that rely heavily on external side information, such as multimodal recommender systems. In specific domains like fashion, music, and movie recommendation, the multi-faceted features characterizing products and services may influence each customer on online platforms differently, paving the way to novel multimodal recommendation models that can learn from such multimodal content. According to the literature, the common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, (iii) optionally fusing all multimodal features, and (iv) predicting the user-item score. Although great effort has been put into designing optimal solutions for (ii-iv), to the best of our knowledge, very little attention has been devoted to exploring procedures for (i) in a rigorous way. In this respect, the existing literature outlines the large availability of multimodal datasets and the ever-growing number of large models accounting for multimodal-aware tasks, but (at the same time) an unjustified adoption of limited standardized solutions. As very recent works from the literature have begun to conduct empirical studies to assess the contribution of multimodality in recommendation, we decide to follow and complement this same research direction. To this end, this paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors. Specifically, we take advantage of three popular and recent frameworks for multimodal feature extraction and reproducibility in recommendation, Ducho, and MMRec/Elliot, respectively, to offer a unified and ready-to-use experimental environment able to run extensive benchmarking analyses leveraging novel multimodal feature extractors. Results, largely validated under different extractors, hyper-parameters of the extractors, domains, and modalities, provide important insights on how to train and tune the next generation of multimodal recommendation algorithms.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Parole chiave
	
				Multimodal Recommendation; Benchmarking; Large Multimodal Models
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Large-scale Benchmarks for Multimodal Recommendation with Ducho / Attimonelli, Matteo; Danese, Danilo; Di Fazio, Angela; Malitesta, Daniele; Pomo, Claudio; Di Noia, Tommaso. - In: EXPERT SYSTEMS WITH APPLICATIONS. - ISSN 0957-4174. - (2025). [10.1016/j.eswa.2025.130813]

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1758115

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

Catalogo dei prodotti della ricerca