Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms

5 February 2025

Papers citing "Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms"

1 / 1 papers shown

Title
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning Xuerui Su Shufang Xie Guoqing Liu Yingce Xia Renqian Luo Peiran Jin Zhiming Ma Yue Wang Zun Wang Yuting Liu LRM 90 5 0 06 Apr 2025