Scaling LLM Inference with Optimized Sample Compute Allocation

Scaling LLM Inference with Optimized Sample Compute Allocation

29 October 2024

Kexun Zhang

William Yang Wang

ArXiv (abs)PDF HTML Github (9★)

Papers citing "Scaling LLM Inference with Optimized Sample Compute Allocation"

5 / 5 papers shown

Title
DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling Fei Wang Xingchen Wan Ruoxi Sun Jiefeng Chen Sercan Ö. Arık LRM 39 0 0 19 Jun 2025
Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory Yexiang Liu Zekun Li Zhi Fang Nan Xu Ran He Tieniu Tan LRM 82 0 0 16 May 2025
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling Bingxuan Li Yiwei Wang Jiuxiang Gu Kai-Wei Chang Nanyun Peng AI4CE 124 5 0 24 Feb 2025
Optimizing Temperature for Language Models with Multi-Sample Inference Weihua Du Yiming Yang Sean Welleck 184 4 0 07 Feb 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Bradley Brown Jordan Juravsky Ryan Ehrlich Ronald Clark Quoc V. Le Christopher Ré Azalia Mirhoseini ALM LRM 330 331 0 03 Jan 2025

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.