ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.06009
37
0

Unlocking Recursive Thinking of LLMs: Alignment via Refinement

6 June 2025
Haoke Zhang
Xiaobo Liang
Cunxiang Wang
Juntao Li
Min Zhang
    LRM
ArXiv (abs)PDFHTML
Main:8 Pages
6 Figures
Bibliography:3 Pages
7 Tables
Appendix:3 Pages
Abstract

The OpenAI o1-series models have demonstrated that leveraging long-form Chain of Thought (CoT) can substantially enhance performance. However, the recursive thinking capabilities of Large Language Models (LLMs) remain limited, particularly in the absence of expert-curated data for distillation. In this paper, we propose \textbf{AvR}: \textbf{Alignment via Refinement}, a novel method aimed at unlocking the potential of LLMs for recursive reasoning through long-form CoT. AvR introduces a refinement process that integrates criticism and improvement actions, guided by differentiable learning techniques to optimize \textbf{refinement-aware rewards}. As a result, the synthesized multi-round data can be organized as a long refinement thought, further enabling test-time scaling. Experimental results show that AvR significantly outperforms conventional preference optimization methods. Notably, with only 3k synthetic samples, our method boosts the performance of the LLaMA-3-8B-Instruct model by over 20\% in win rate on AlpacaEval 2.0. Our code is available at Github (this https URL).

View on arXiv
@article{zhang2025_2506.06009,
  title={ Unlocking Recursive Thinking of LLMs: Alignment via Refinement },
  author={ Haoke Zhang and Xiaobo Liang and Cunxiang Wang and Juntao Li and Min Zhang },
  journal={arXiv preprint arXiv:2506.06009},
  year={ 2025 }
}
Comments on this paper