Catalogo dei prodotti della ricerca

Traditional Information Retrieval (IR) metrics, such as nDCG, MAP, and MRR, assume that human users sequentially examine documents with diminishing attention to lower ranks. This assumption breaks down in Retrieval Augmented Generation (RAG) systems, where search results are consumed by Large Language Models (LLMs), which, unlike humans, process all retrieved documents as a whole rather than sequentially. Additionally, traditional IR metrics do not account for related but irrelevant documents that actively degrade generation quality, rather than merely being ignored. Due to these two major misalignments, namely human vs. machine position discount and human relevance vs. machine utility, classical IR metrics do not accurately predict RAG performance. We introduce a utility-based annotation schema that quantifies both the positive contribution of relevant passages and the negative impact of distracting ones. Building on this foundation, we propose UDCG (Utility and Distraction-aware Cumulative Gain), a metric using an LLM-oriented positional discount to directly optimize the correlation with the end-to-end answer accuracy. Experiments on five datasets and six LLMs demonstrate that UDCG improves correlation by up to 36% compared to traditional metrics. Our work provides a critical step toward aligning IR evaluation with LLM consumers and enables more reliable assessment of RAG components.

Redefining Retrieval Evaluation in the Era of LLMs / Trappolini, G., Cuconasu, F., Filice, S., Maarek, Y., Silvestri, F.. - (2026), pp. 8359-8375. (EACL 2026 - 19th Conference of the European Chapter of the Association for Computational Linguistics Rabat; Morocco ) [10.18653/v1/2026.eacl-long.391].

Redefining Retrieval Evaluation in the Era of LLMs

Giovanni Trappolini;Florin Cuconasu;Simone Filice;Yoelle Maarek;Fabrizio Silvestri

2026

Abstract

Traditional Information Retrieval (IR) metrics, such as nDCG, MAP, and MRR, assume that human users sequentially examine documents with diminishing attention to lower ranks. This assumption breaks down in Retrieval Augmented Generation (RAG) systems, where search results are consumed by Large Language Models (LLMs), which, unlike humans, process all retrieved documents as a whole rather than sequentially. Additionally, traditional IR metrics do not account for related but irrelevant documents that actively degrade generation quality, rather than merely being ignored. Due to these two major misalignments, namely human vs. machine position discount and human relevance vs. machine utility, classical IR metrics do not accurately predict RAG performance. We introduce a utility-based annotation schema that quantifies both the positive contribution of relevant passages and the negative impact of distracting ones. Building on this foundation, we propose UDCG (Utility and Distraction-aware Cumulative Gain), a metric using an LLM-oriented positional discount to directly optimize the correlation with the end-to-end answer accuracy. Experiments on five datasets and six LLMs demonstrate that UDCG improves correlation by up to 36% compared to traditional metrics. Our work provides a critical step toward aligning IR evaluation with LLM consumers and enables more reliable assessment of RAG components.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2026
			
	Nome convegno
	
				EACL 2026 - 19th Conference of the European Chapter of the Association for Computational Linguistics
			
	Parole chiave
	
				RAG; LLM; Information Retrieval
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Redefining Retrieval Evaluation in the Era of LLMs / Trappolini, G., Cuconasu, F., Filice, S., Maarek, Y., Silvestri, F.. - (2026), pp. 8359-8375. (EACL 2026 - 19th Conference of the European Chapter of the Association for Computational Linguistics Rabat; Morocco ) [10.18653/v1/2026.eacl-long.391].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Trappolini_Redefining_2026.pdf accesso aperto Note: 10.18653/v1/2026.eacl-long.391 Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 510.79 kB Formato Adobe PDF	510.79 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1764953

Citazioni

ND

1

ND

social impact