ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.02193
  4. Cited By
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

3 April 2025
Yifan Wang
Runjin Chen
Bolian Li
David Cho
Yihe Deng
Ruqi Zhang
Tianlong Chen
Zhangyang Wang
A. Grama
Junyuan Hong
    SyDa
ArXivPDFHTML

Papers citing "More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment"

3 / 3 papers shown
Title
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
282
1,449
0
27 Jul 2023
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal
  Language Models
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Erfan Shayegani
Yue Dong
Nael B. Abu-Ghazaleh
85
145
0
26 Jul 2023
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
448
19,006
0
20 Jul 2017
1