Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.03095
Cited By
Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms
5 February 2025
Xuerui Su
Yue Wang
Jinhua Zhu
Mingyang Yi
Feng Xu
Zhiming Ma
Yuting Liu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms"
1 / 1 papers shown
Title
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Xuerui Su
Shufang Xie
Guoqing Liu
Yingce Xia
Renqian Luo
Peiran Jin
Zhiming Ma
Yue Wang
Zun Wang
Yuting Liu
LRM
90
5
0
06 Apr 2025
1