Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization
- SyDa

Neural retrieval models excel in Web search, but their training requires substantial amounts of labeled query-document pairs, which are costly to obtain. With the widespread availability of Web document collections like ClueWeb22, synthetic queries generated by large language models offer a scalable alternative. Still, synthetic training queries often vary in quality, which leads to suboptimal downstream retrieval performance. Existing methods typically filter out noisy query-document pairs based on signals from an external re-ranker. In contrast, we propose a framework that leverages Direct Preference Optimization (DPO) to integrate ranking signals into the query generation process, aiming to directly optimize the model towards generating high-quality queries that maximize downstream retrieval effectiveness. Experiments show higher ranker-assessed relevance between query-document pairs after DPO, leading to stronger downstream performance on the MS~MARCO benchmark when compared to baseline models trained with synthetic data.
View on arXiv@article{coelho2025_2505.19307, title={ Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization }, author={ João Coelho and Bruno Martins and João Magalhães and Chenyan Xiong }, journal={arXiv preprint arXiv:2505.19307}, year={ 2025 } }