Anchored Preference Optimization and Contrastive Revisions: Addressing
Underspecification in Alignment

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

12 August 2024

Karel DÓosterlinck

Thomas Demeester

Christopher Potts

Papers citing "Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment"

13 / 13 papers shown

Title
Reasoning Beyond Limits: Advances and Open Problems for LLMs M. Ferrag Norbert Tihanyi Merouane Debbah ELM OffRL LRM AI4CE 131 2 0 26 Mar 2025
InCo-DPO: Balancing Distribution Shift and Data Quality for Enhanced Preference Optimization Yunan Wang Jijie Li Bo Zhang Liangdong Wang Guang Liu 63 0 0 20 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings Austin Xu Srijan Bansal Yifei Ming Semih Yavuz Shafiq R. Joty ELM 95 3 0 19 Mar 2025
Aligning to What? Limits to RLHF Based Alignment Logan Barnhart Reza Akbarian Bafghi Stephen Becker M. Raissi 39 1 0 12 Mar 2025
C-3DPO: Constrained Controlled Classification for Direct Preference Optimization Kavosh Asadi Julien Han Xingzi Xu Dominique Perrault-Joncas Shoham Sabach Karim Bouyarmane Mohammad Ghavamzadeh 34 0 0 22 Feb 2025
Bayesian scaling laws for in-context learning Aryaman Arora Dan Jurafsky Christopher Potts Noah D. Goodman 26 2 0 21 Oct 2024
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking Yifan Jiang Kriti Aggarwal Tanmay Laud Kashif Munir Jay Pujara Subhabrata Mukherjee AAML 53 10 0 26 Sep 2024
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits Tuhin Chakrabarty Philippe Laban C. Wu 49 8 0 22 Sep 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More Zhichao Wang Bin Bi Shiva K. Pentyala Kiran Ramnath Sougata Chaudhuri ... Z. Zhu Xiang-Bo Mao S. Asur Na Na Cheng OffRL 39 39 0 23 Jul 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Corby Rosset Ching-An Cheng Arindam Mitra Michael Santacroce Ahmed Hassan Awadallah Tengyang Xie 152 114 0 04 Apr 2024
Direct Preference Optimization with an Offset Afra Amini Tim Vieira Ryan Cotterell 73 55 0 16 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization Kawin Ethayarajh Winnie Xu Niklas Muennighoff Dan Jurafsky Douwe Kiela 176 449 0 02 Feb 2024
$Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information$ Understanding Dataset Difficulty with $\mathcal{V}$ -Usable Information Kawin Ethayarajh Yejin Choi Swabha Swayamdipta 167 157 0 16 Oct 2021