101
1

Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

Main:6 Pages
3 Figures
Bibliography:2 Pages
2 Tables
Appendix:4 Pages
Abstract

We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-nn test-time scaling with a reward model r(x,y)r(x,y) and speculative samples from a small auxiliary model πS(yx)\pi_S(y\mid x). We provably approximate the optimal tilted policy πβ,B(yx)πB(yx)exp(βr(x,y))\pi_{\beta,B}(y\mid x) \propto \pi_B(y\mid x)\exp(\beta\,r(x,y)) of soft best-of-nn under the primary model πB\pi_B. We derive a theoretical bound on the KL divergence between our induced distribution and the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math), our method achieves higher accuracy than standard soft best-of-nn with πS\pi_S and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of-nn with πB\pi_B. The code is available atthis https URL.

View on arXiv
@article{geuter2025_2506.04118,
  title={ Guided Speculative Inference for Efficient Test-Time Alignment of LLMs },
  author={ Jonathan Geuter and Youssef Mroueh and David Alvarez-Melis },
  journal={arXiv preprint arXiv:2506.04118},
  year={ 2025 }
}
Comments on this paper