Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.02193
Cited By
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
3 April 2025
Yifan Wang
Runjin Chen
Bolian Li
David Cho
Yihe Deng
Ruqi Zhang
Tianlong Chen
Zhangyang Wang
A. Grama
Junyuan Hong
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment"
3 / 3 papers shown
Title
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
287
1,449
0
27 Jul 2023
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Erfan Shayegani
Yue Dong
Nael B. Abu-Ghazaleh
85
145
0
26 Jul 2023
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
463
19,006
0
20 Jul 2017
1