Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.02363
Cited By
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
5 May 2025
Tianjian Li
Daniel Khashabi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning"
3 / 3 papers shown
Title
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin
Sadhika Malladi
Adithya Bhaskar
Danqi Chen
Sanjeev Arora
Boris Hanin
223
35
0
11 Oct 2024
Benchmarking Language Model Creativity: A Case Study on Code Generation
Yining Lu
Dixuan Wang
Tianjian Li
Dongwei Jiang
Daniel Khashabi
Meng Jiang
Daniel Khashabi
LRM
136
15
0
12 Jul 2024
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
405
340
0
18 Jan 2024
1