v1v2 (latest)

Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins

31 July 2024

Main:9 Pages

6 Figures

Bibliography:2 Pages

5 Tables

Abstract

Representation-based retrieval models, so-called bi-encoders, estimate the relevance of a document to a query by calculating the similarity of their respective embeddings. Current state-of-the-art bi-encoders are trained using an expensive training regime involving knowledge distillation from a teacher model and batch-sampling. Instead of relying on a teacher model, we contribute a novel parameter-free loss function for self-supervision that exploits the pre-trained language modeling capabilities of the encoder model as a training signal, eliminating the need for batch sampling by performing implicit hard negative mining. We investigate the capabilities of our proposed approach through extensive experiments, demonstrating that self-distillation can match the effectiveness of teacher distillation using only 13.5% of the data, while offering a speedup in training time between 3x and 15x compared to parametrized losses. All code and data is made openly available.

View on arXiv

@article{gienapp2025_2407.21515,
  title={ Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins },
  author={ Lukas Gienapp and Niklas Deckers and Martin Potthast and Harrisen Scells },
  journal={arXiv preprint arXiv:2407.21515},
  year={ 2025 }
}

Comments on this paper