Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.20065
Cited By
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
26 May 2025
Geon-hyeong Kim
Youngsoo Jang
Yu Jin Kim
Byoungjip Kim
Honglak Lee
Kyunghoon Bae
Moontae Lee
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety"
4 / 4 papers shown
Title
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
66
580
0
18 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
133
4,085
0
09 Jun 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
53
569
0
22 May 2023
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
47
243
0
03 Oct 2022
1