Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.23316
Cited By
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
29 May 2025
Kaiyang Guo
Yinchuan Li
Zhitang Chen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO"
4 / 4 papers shown
Title
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
390
2,024
0
22 Jan 2025
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin
Sadhika Malladi
Adithya Bhaskar
Danqi Chen
Sanjeev Arora
Boris Hanin
212
35
0
11 Oct 2024
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan
Tianyu Pang
Chao Du
Kejiang Chen
Weiming Zhang
Min Lin
MU
259
13
0
10 Oct 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
167
403
0
06 Apr 2024
1