Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.19720
Cited By
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
25 October 2024
Shilong Li
Yancheng He
Hui Huang
Xingyuan Bu
Qingbin Liu
Hangyu Guo
Weixun Wang
Jihao Gu
Wenbo Su
Bo Zheng
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision"
5 / 5 papers shown
Title
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
Weixun Wang
Shaopan Xiong
Gengru Chen
Wei Gao
Sheng Guo
...
Lin Qu
Wenbo Su
Wei Wang
Jiamang Wang
Bo Zheng
OffRL
63
0
0
06 Jun 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Jing Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Yongbin Li
Jun Zhu
MLLM
456
0
0
25 Apr 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
Jing Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Zhenru Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELM
LRM
164
17
0
26 Feb 2025
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Yuxin Jiang
Bo Huang
Yufei Wang
Xingshan Zeng
Liangyou Li
Yasheng Wang
Xin Jiang
Lifeng Shang
Ruiming Tang
Wei Wang
127
7
0
14 Aug 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
164
403
0
06 Apr 2024
1