Catalogo dei prodotti della ricerca

Retrieval Augmented Generation (RAG) systems often struggle with irrelevant passages that mislead LLMs during answer generation. This work introduces a comprehensive framework for quantifying and understanding the distracting nature of such passages. We propose a novel metric to measure passage-level distraction effects, demonstrating its robustness across different models. Our methodology combines retrieval-based approaches with controlled synthetic generation techniques that create distracting content spanning multiple categories. Through experimental validation on standard question-answering benchmarks, we show that passages with higher distraction scores consistently degrade model effectiveness, even when relevant content is present. Leveraging this framework, we construct an enhanced training dataset featuring systematically curated distracting passages. When fine-tuned on this dataset, LLMs demonstrate substantial improvements, achieving up to 7.5% accuracy gains over baselines trained on standard RAG data. Our contributions provide both theoretical insights into distraction mechanisms in RAG and practical solutions for developing more robust retrieval-augmented language models.

Beyond Relevance: Quantifying Distraction of Irrelevant Passages in RAG / Amiraz, Chen; Cuconasu, Florin; Filice, Simone; Karnin, Zohar. - (2025), pp. 39-44. (Intervento presentato al convegno Italian Information Retrieval Workshop 2025 tenutosi a Cagliari, Italy).

Beyond Relevance: Quantifying Distraction of Irrelevant Passages in RAG

Chen Amiraz;Florin Cuconasu;Simone Filice;Zohar Karnin

2025

Abstract

Retrieval Augmented Generation (RAG) systems often struggle with irrelevant passages that mislead LLMs during answer generation. This work introduces a comprehensive framework for quantifying and understanding the distracting nature of such passages. We propose a novel metric to measure passage-level distraction effects, demonstrating its robustness across different models. Our methodology combines retrieval-based approaches with controlled synthetic generation techniques that create distracting content spanning multiple categories. Through experimental validation on standard question-answering benchmarks, we show that passages with higher distraction scores consistently degrade model effectiveness, even when relevant content is present. Leveraging this framework, we construct an enhanced training dataset featuring systematically curated distracting passages. When fine-tuned on this dataset, LLMs demonstrate substantial improvements, achieving up to 7.5% accuracy gains over baselines trained on standard RAG data. Our contributions provide both theoretical insights into distraction mechanisms in RAG and practical solutions for developing more robust retrieval-augmented language models.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				Italian Information Retrieval Workshop 2025
			
	Parole chiave
	
				Retrieval Augmented Generation, Large Language Models, Information Retrieval
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Beyond Relevance: Quantifying Distraction of Irrelevant Passages in RAG / Amiraz, Chen; Cuconasu, Florin; Filice, Simone; Karnin, Zohar. - (2025), pp. 39-44. (Intervento presentato al  convegno Italian Information Retrieval Workshop 2025 tenutosi a Cagliari, Italy).

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1745522

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact