ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.03095
  4. Cited By
Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms

Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms

5 February 2025
Xuerui Su
Yue Wang
Jinhua Zhu
Mingyang Yi
Feng Xu
Zhiming Ma
Yuting Liu
ArXiv (abs)PDFHTML

Papers citing "Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms"

1 / 1 papers shown
Title
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Xuerui Su
Shufang Xie
Guoqing Liu
Yingce Xia
Renqian Luo
Peiran Jin
Zhiming Ma
Yue Wang
Zun Wang
Yuting Liu
LRM
90
5
0
06 Apr 2025
1