Recent advances in Information Retrieval have leveraged high-dimensional embedding spaces to improve the retrieval of relevant documents. Moreover, the Manifold Clustering Hypothesis suggests that despite these high-dimensional representations, documents relevant to a query reside on a lower-dimensional, query-dependent manifold. While this hypothesis has inspired new retrieval methods, existing approaches still face challenges in effectively separating non-relevant information from relevant signals. We propose a novel methodology that addresses these limitations by leveraging information from both relevant and non-relevant documents. Our method, Eclipse, computes a centroid based on irrelevant documents as a reference to estimate noisy dimensions present in relevant ones, enhancing retrieval performance. Extensive experiments on three in-domain and one out-of-domain benchmarks demonstrate an average improvement of up to 21.03% (resp. 22.88%) in mAP(AP) and 12.04% (resp. 14.18%) in nDCG@10 w.r.t. the DIME-based baseline (resp. the baseline using all dimensions). Our results pave the way for more robust, pseudo-irrelevance-based retrieval systems in future IR research.

Eclipse: Contrastive Dimension Importance Estimation with Pseudo-Irrelevance Feedback for Dense Retrieval / D'Erasmo, Giulio; Trappolini, Giovanni; Silvestri, Fabrizio; Tonellotto, Nicola. - (2025), pp. 147-154. ( ICTIR Padua; Italy ) [10.1145/3731120.3744579].

Eclipse: Contrastive Dimension Importance Estimation with Pseudo-Irrelevance Feedback for Dense Retrieval

Giulio D'Erasmo
;
Giovanni Trappolini
;
Fabrizio Silvestri
;
Nicola Tonellotto
2025

Abstract

Recent advances in Information Retrieval have leveraged high-dimensional embedding spaces to improve the retrieval of relevant documents. Moreover, the Manifold Clustering Hypothesis suggests that despite these high-dimensional representations, documents relevant to a query reside on a lower-dimensional, query-dependent manifold. While this hypothesis has inspired new retrieval methods, existing approaches still face challenges in effectively separating non-relevant information from relevant signals. We propose a novel methodology that addresses these limitations by leveraging information from both relevant and non-relevant documents. Our method, Eclipse, computes a centroid based on irrelevant documents as a reference to estimate noisy dimensions present in relevant ones, enhancing retrieval performance. Extensive experiments on three in-domain and one out-of-domain benchmarks demonstrate an average improvement of up to 21.03% (resp. 22.88%) in mAP(AP) and 12.04% (resp. 14.18%) in nDCG@10 w.r.t. the DIME-based baseline (resp. the baseline using all dimensions). Our results pave the way for more robust, pseudo-irrelevance-based retrieval systems in future IR research.
2025
ICTIR
dimension importance estimation; information retrieval
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Eclipse: Contrastive Dimension Importance Estimation with Pseudo-Irrelevance Feedback for Dense Retrieval / D'Erasmo, Giulio; Trappolini, Giovanni; Silvestri, Fabrizio; Tonellotto, Nicola. - (2025), pp. 147-154. ( ICTIR Padua; Italy ) [10.1145/3731120.3744579].
File allegati a questo prodotto
File Dimensione Formato  
DErasmo_ECLIPSE_2025.pdf

accesso aperto

Note: https://doi.org/10.1145/3731120.3744579
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 681.17 kB
Formato Adobe PDF
681.17 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1754247
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact