Catalogo dei prodotti della ricerca

Retrieval-Augmented Generation (RAG) systems enhance the performance of Large Language Models (LLMs) by incorporating external information fetched from a retriever component. While traditional approaches prioritize retrieving “relevant” documents, our research reveals that these documents can be a double-edged sword. We explore the counterintuitive benefits of integrating noisy, non-relevant documents into the retrieval process. In particular, we conduct an analysis of how different types of retrieved documents—relevant, distracting, and random—affect the overall effectiveness of RAG systems. Our findings reveal that the inclusion of random documents, often perceived as noise, can significantly improve LLM accuracy, with gains up to 35%. Conversely, highly scored but non-relevant documents from the retriever negatively impact performance. These insights challenge conventional retrieval strategies and suggest a paradigm shift towards rethinking information retrieval for neural models.

Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation / Cuconasu, Florin; Trappolini, Giovanni; Siciliano, Federico; Filice, Simone; Campagnano, Cesare; Maarek, Yoelle; Tonellotto, Nicola; Silvestri, Fabrizio. - 3802:(2024), pp. 95-98. ( Italian Information Retrieval Workshop 2024 Udine; Italy ).

Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation

Cuconasu, Florin;Trappolini, Giovanni;Siciliano, Federico;Filice, Simone;Campagnano, Cesare;Maarek, Yoelle;Tonellotto, Nicola;Silvestri, Fabrizio

2024

Abstract

Retrieval-Augmented Generation (RAG) systems enhance the performance of Large Language Models (LLMs) by incorporating external information fetched from a retriever component. While traditional approaches prioritize retrieving “relevant” documents, our research reveals that these documents can be a double-edged sword. We explore the counterintuitive benefits of integrating noisy, non-relevant documents into the retrieval process. In particular, we conduct an analysis of how different types of retrieved documents—relevant, distracting, and random—affect the overall effectiveness of RAG systems. Our findings reveal that the inclusion of random documents, often perceived as noise, can significantly improve LLM accuracy, with gains up to 35%. Conversely, highly scored but non-relevant documents from the retriever negatively impact performance. These insights challenge conventional retrieval strategies and suggest a paradigm shift towards rethinking information retrieval for neural models.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Nome convegno
	
				Italian Information Retrieval Workshop 2024
			
	Parole chiave
	
				Information Retrieval; Retrieval-Augmented Generation; Large Language Models
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation / Cuconasu, Florin; Trappolini, Giovanni; Siciliano, Federico; Filice, Simone; Campagnano, Cesare; Maarek, Yoelle; Tonellotto, Nicola; Silvestri, Fabrizio. - 3802:(2024), pp. 95-98. ( Italian Information Retrieval Workshop 2024 Udine; Italy ).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Coconasu_Rethinking_2024.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 249 kB Formato Adobe PDF	249 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1723801

Citazioni

ND

2

ND

social impact