ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.17974
61
5

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

29 January 2025
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
Wenxuan Zhou
Z. Zeng
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
    LRM
ArXivPDFHTML
Abstract

Solving mathematics problems has been an intriguing capability of large language models, and many efforts have been made to improve reasoning by extending reasoning length, such as through self-correction and extensive long chain-of-thoughts. While promising in problem-solving, advanced long reasoning chain models exhibit an undesired single-modal behavior, where trivial questions require unnecessarily tedious long chains of thought. In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). In a nutshell, models fine-tuned through IBPO learn to ``understand'' the difficulty of queries and allocate inference budgets to harder ones. With different inference budgets, our best models are able to have a 4.144.144.14\% and 5.745.745.74\% absolute improvement (8.088.088.08\% and 11.211.211.2\% relative improvement) on MATH500 using 2.162.162.16x and 4.324.324.32x inference budgets respectively, relative to LLaMA3.1 8B Instruct. These improvements are approximately 222x those of self-consistency under the same budgets.

View on arXiv
Comments on this paper