In the field of recommender systems, an important issue within the current state-of-the-art is the inconsistency in item rankings produced by models initialized with different weight seeds. Despite these models achieve convergence and obtain similar average performance metrics, their item rankings differ significantly. This phenomenon is quantitavely demonstrated using metrics such as Rank List Sensitivity (RLS) and Normalized Discounted Cumulative Gain (NDCG) across different model pairs. In this paper, we reaffirm the existence of this problem and provide new insights by analysing models with common item embeddings but different network initialization, and different item embeddings but common network initialization, to identify which network components most influence ranking variability. To address the general issue, we propose an ensemble approach that averages the output of multiple models. Our ensemble maintains the NDCG of the original model while significantly improving ranking stability: the RLS FRBO@10 value shows approximate increase of 30.82%.
Robust Solutions for Ranking Variability in Recommender Systems / Francomano, BONIFACIO MARCO; Siciliano, Federico; Silvestri, Fabrizio. - 3924:(2024). (Intervento presentato al convegno Workshop Design, Evaluation, and Deployment of Robust Recommender Systems 2024 (RobustRecSys 2024) tenutosi a Bari; Italy).
Robust Solutions for Ranking Variability in Recommender Systems
Bonifacio Marco Francomano;Federico Siciliano
Supervision
;Fabrizio Silvestri
2024
Abstract
In the field of recommender systems, an important issue within the current state-of-the-art is the inconsistency in item rankings produced by models initialized with different weight seeds. Despite these models achieve convergence and obtain similar average performance metrics, their item rankings differ significantly. This phenomenon is quantitavely demonstrated using metrics such as Rank List Sensitivity (RLS) and Normalized Discounted Cumulative Gain (NDCG) across different model pairs. In this paper, we reaffirm the existence of this problem and provide new insights by analysing models with common item embeddings but different network initialization, and different item embeddings but common network initialization, to identify which network components most influence ranking variability. To address the general issue, we propose an ensemble approach that averages the output of multiple models. Our ensemble maintains the NDCG of the original model while significantly improving ranking stability: the RLS FRBO@10 value shows approximate increase of 30.82%.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.