ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.04118
103
1

Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

4 June 2025
Jonathan Geuter
Youssef Mroueh
David Alvarez-Melis
ArXiv (abs)PDFHTML
Main:6 Pages
3 Figures
Bibliography:2 Pages
2 Tables
Appendix:4 Pages
Abstract

We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-nnn test-time scaling with a reward model r(x,y)r(x,y)r(x,y) and speculative samples from a small auxiliary model πS(y∣x)\pi_S(y\mid x)πS​(y∣x). We provably approximate the optimal tilted policy πβ,B(y∣x)∝πB(y∣x)exp⁡(β r(x,y))\pi_{\beta,B}(y\mid x) \propto \pi_B(y\mid x)\exp(\beta\,r(x,y))πβ,B​(y∣x)∝πB​(y∣x)exp(βr(x,y)) of soft best-of-nnn under the primary model πB\pi_BπB​. We derive a theoretical bound on the KL divergence between our induced distribution and the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math), our method achieves higher accuracy than standard soft best-of-nnn with πS\pi_SπS​ and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of-nnn with πB\pi_BπB​. The code is available atthis https URL.

View on arXiv
@article{geuter2025_2506.04118,
  title={ Guided Speculative Inference for Efficient Test-Time Alignment of LLMs },
  author={ Jonathan Geuter and Youssef Mroueh and David Alvarez-Melis },
  journal={arXiv preprint arXiv:2506.04118},
  year={ 2025 }
}
Comments on this paper