ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.19720
  4. Cited By
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional
  Supervision

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

25 October 2024
Shilong Li
Yancheng He
Hui Huang
Xingyuan Bu
Qingbin Liu
Hangyu Guo
Weixun Wang
Jihao Gu
Wenbo Su
Bo Zheng
ArXiv (abs)PDFHTML

Papers citing "2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision"

5 / 5 papers shown
Title
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
Weixun Wang
Shaopan Xiong
Gengru Chen
Wei Gao
Sheng Guo
...
Lin Qu
Wenbo Su
Wei Wang
Jiamang Wang
Bo Zheng
OffRL
63
0
0
06 Jun 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Jing Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Yongbin Li
Jun Zhu
MLLM
456
0
0
25 Apr 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
Jing Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Zhenru Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELMLRM
164
17
0
26 Feb 2025
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Yuxin Jiang
Bo Huang
Yufei Wang
Xingshan Zeng
Liangyou Li
Yasheng Wang
Xin Jiang
Lifeng Shang
Ruiming Tang
Wei Wang
127
7
0
14 Aug 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
164
403
0
06 Apr 2024
1