ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.12854
  4. Cited By
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

17 March 2025
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
Y. Fu
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
    LRM
ArXivPDFHTML

Papers citing "Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation"

2 / 2 papers shown
Title
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu
Jiahao Lin
Qichao Zhang
Xiangyu Tian
Linjing Li
Xiangyuan Lan
Dongbin Zhao
OffRL
ReLM
LRM
15
0
0
16 May 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
100
4
0
09 Apr 2025
1