Retrieval-Augmented Generation (RAG) systems enhance the performance of Large Language Models (LLMs) by incorporating external information fetched from a retriever component. While traditional approaches prioritize retrieving “relevant” documents, our research reveals that these documents can be a double-edged sword. We explore the counterintuitive benefits of integrating noisy, non-relevant documents into the retrieval process. In particular, we conduct an analysis of how different types of retrieved documents—relevant, distracting, and random—affect the overall effectiveness of RAG systems. Our findings reveal that the inclusion of random documents, often perceived as noise, can significantly improve LLM accuracy, with gains up to 35%. Conversely, highly scored but non-relevant documents from the retriever negatively impact performance. These insights challenge conventional retrieval strategies and suggest a paradigm shift towards rethinking information retrieval for neural models.

Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation / Cuconasu, Florin; Trappolini, Giovanni; Siciliano, Federico; Filice, Simone; Campagnano, Cesare; Maarek, Yoelle; Tonellotto, Nicola; Silvestri, Fabrizio. - 3802:(2024), pp. 95-98. (Intervento presentato al convegno Italian Information Retrieval Workshop 2024 tenutosi a Udine; Italy).

Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation

Cuconasu, Florin
;
Trappolini, Giovanni
;
Siciliano, Federico
;
Campagnano, Cesare
;
Tonellotto, Nicola
;
Silvestri, Fabrizio
2024

Abstract

Retrieval-Augmented Generation (RAG) systems enhance the performance of Large Language Models (LLMs) by incorporating external information fetched from a retriever component. While traditional approaches prioritize retrieving “relevant” documents, our research reveals that these documents can be a double-edged sword. We explore the counterintuitive benefits of integrating noisy, non-relevant documents into the retrieval process. In particular, we conduct an analysis of how different types of retrieved documents—relevant, distracting, and random—affect the overall effectiveness of RAG systems. Our findings reveal that the inclusion of random documents, often perceived as noise, can significantly improve LLM accuracy, with gains up to 35%. Conversely, highly scored but non-relevant documents from the retriever negatively impact performance. These insights challenge conventional retrieval strategies and suggest a paradigm shift towards rethinking information retrieval for neural models.
2024
Italian Information Retrieval Workshop 2024
Information Retrieval; Retrieval-Augmented Generation; Large Language Models
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation / Cuconasu, Florin; Trappolini, Giovanni; Siciliano, Federico; Filice, Simone; Campagnano, Cesare; Maarek, Yoelle; Tonellotto, Nicola; Silvestri, Fabrizio. - 3802:(2024), pp. 95-98. (Intervento presentato al convegno Italian Information Retrieval Workshop 2024 tenutosi a Udine; Italy).
File allegati a questo prodotto
File Dimensione Formato  
Coconasu_Rethinking_2024.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 249 kB
Formato Adobe PDF
249 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1723801
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact