Retrieval-Augmented Generation (RAG) systems enhance the performance of Large Language Models (LLMs) by incorporating external information fetched from a retriever component. While traditional approaches prioritize retrieving “relevant” documents, our research reveals that these documents can be a double-edged sword. We explore the counterintuitive benefits of integrating noisy, non-relevant documents into the retrieval process. In particular, we conduct an analysis of how different types of retrieved documents—relevant, distracting, and random—affect the overall effectiveness of RAG systems. Our findings reveal that the inclusion of random documents, often perceived as noise, can significantly improve LLM accuracy, with gains up to 35%. Conversely, highly scored but non-relevant documents from the retriever negatively impact performance. These insights challenge conventional retrieval strategies and suggest a paradigm shift towards rethinking information retrieval for neural models.
Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation / Cuconasu, Florin; Trappolini, Giovanni; Siciliano, Federico; Filice, Simone; Campagnano, Cesare; Maarek, Yoelle; Tonellotto, Nicola; Silvestri, Fabrizio. - 3802:(2024), pp. 95-98. (Intervento presentato al convegno Italian Information Retrieval Workshop 2024 tenutosi a Udine; Italy).
Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation
Cuconasu, Florin
;Trappolini, Giovanni
;Siciliano, Federico
;Campagnano, Cesare
;Tonellotto, Nicola
;Silvestri, Fabrizio
2024
Abstract
Retrieval-Augmented Generation (RAG) systems enhance the performance of Large Language Models (LLMs) by incorporating external information fetched from a retriever component. While traditional approaches prioritize retrieving “relevant” documents, our research reveals that these documents can be a double-edged sword. We explore the counterintuitive benefits of integrating noisy, non-relevant documents into the retrieval process. In particular, we conduct an analysis of how different types of retrieved documents—relevant, distracting, and random—affect the overall effectiveness of RAG systems. Our findings reveal that the inclusion of random documents, often perceived as noise, can significantly improve LLM accuracy, with gains up to 35%. Conversely, highly scored but non-relevant documents from the retriever negatively impact performance. These insights challenge conventional retrieval strategies and suggest a paradigm shift towards rethinking information retrieval for neural models.File | Dimensione | Formato | |
---|---|---|---|
Coconasu_Rethinking_2024.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
249 kB
Formato
Adobe PDF
|
249 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.