Recommendation pipelines involve several stages that can critically affect performance and reproducibility. However, early pipeline stages remain under-standardized, limiting comparability and interoperability across studies. This tutorial addresses this gap by providing both theoretical insights and hands-on experience with tools and practices for standardized data processing in recommender systems. In the first part, we introduce DataRec, a Python library for reproducible and interoperable data management, and discuss data filtering, splitting, and topological analysis techniques. In the second part, we explore multimodal feature extraction in domains such as fashion, music, and movies, focusing on the challenges of meaningful multimodal integration. We introduce Ducho, a unified framework for extracting audio, visual, and textual features using modern backends, and demonstrate its integration with the evaluation framework Elliot. The tutorial targets researchers and practitioners with an interest in recommender systems, data preprocessing, and multimodal modeling. All materials, including slides, code, datasets, and recordings, will be openly available on a dedicated tutorial website: https://sites.google.com/view/dd4rec-tutorial/.
Standard Practices for Data Processing and Multimodal Feature Extraction in Recommendation with DataRec and Ducho (D&D4Rec) / Mancino, Alberto Carlo Maria; Attimonelli, Matteo; Di Fazio, Angela; Malitesta, Daniele; Di Noia, Tommaso. - (2025), pp. 1432-1434. ( 19th ACM Conference on Recommender Systems, RecSys 2025 Prague, Czech Republic ) [10.1145/3705328.3748009].
Standard Practices for Data Processing and Multimodal Feature Extraction in Recommendation with DataRec and Ducho (D&D4Rec)
Mancino, Alberto Carlo Maria
;Attimonelli, Matteo;
2025
Abstract
Recommendation pipelines involve several stages that can critically affect performance and reproducibility. However, early pipeline stages remain under-standardized, limiting comparability and interoperability across studies. This tutorial addresses this gap by providing both theoretical insights and hands-on experience with tools and practices for standardized data processing in recommender systems. In the first part, we introduce DataRec, a Python library for reproducible and interoperable data management, and discuss data filtering, splitting, and topological analysis techniques. In the second part, we explore multimodal feature extraction in domains such as fashion, music, and movies, focusing on the challenges of meaningful multimodal integration. We introduce Ducho, a unified framework for extracting audio, visual, and textual features using modern backends, and demonstrate its integration with the evaluation framework Elliot. The tutorial targets researchers and practitioners with an interest in recommender systems, data preprocessing, and multimodal modeling. All materials, including slides, code, datasets, and recordings, will be openly available on a dedicated tutorial website: https://sites.google.com/view/dd4rec-tutorial/.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


