Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.12854
Cited By
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
17 March 2025
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
Y. Fu
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation"
2 / 2 papers shown
Title
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu
Jiahao Lin
Qichao Zhang
Xiangyu Tian
Linjing Li
Xiangyuan Lan
Dongbin Zhao
OffRL
ReLM
LRM
15
0
0
16 May 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
100
4
0
09 Apr 2025
1