Giulio d’Erasmo - From Eclipse to Theory: Advancing Dimension Importance Estimation for Information Retrieval
- Date: 29 septembre 2025 à 13h
- Salle: 65-66 304
Dense retrieval models rely on high-dimensional embeddings, but many dimensions encode noise rather than relevance, leading to suboptimal ranking. Dimension Importance Estimation (DIME) was recently proposed to mitigate this issue by identifying and retaining only informative dimensions. In this presentation, I will introduce Eclipse, a contrastive extension of DIME that leverages pseudo-irrelevant feedback. By constructing centroids from low-ranked documents and subtracting their contribution from relevant signals, Eclipse down-weights noisy features while highlighting semantically meaningful ones. This plug-in approach requires no retraining of the underlying retriever and achieves substantial effectiveness gains across multiple TREC benchmarks and state-of-the-art dense models (e.g., up to +22.9% MAP and +14.2% nDCG@10 over baselines). Building on this, my subsequent work was to provide a statistical formalisation of DIME, offering theoretical guarantees and motivating kernel-based weighting schemes as more consistent estimators than uniform baselines in IR, and providing a statistical framework to estimate in advance the number of dimensions to retain. Finally, I have an ongoing work when we are extending the DIME paradigm to recommender systems, where distinguishing informative signals from noisy features is equally crucial.