Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.06266
Cited By
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
12 August 2024
Karel DÓosterlinck
Winnie Xu
Chris Develder
Thomas Demeester
A. Singh
Christopher Potts
Douwe Kiela
Shikib Mehri
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment"
13 / 13 papers shown
Title
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
152
2
0
26 Mar 2025
InCo-DPO: Balancing Distribution Shift and Data Quality for Enhanced Preference Optimization
Yunan Wang
Jijie Li
Bo Zhang
Liangdong Wang
Guang Liu
63
0
0
20 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Chenyu You
ELM
95
3
0
19 Mar 2025
Aligning to What? Limits to RLHF Based Alignment
Logan Barnhart
Reza Akbarian Bafghi
Stephen Becker
M. Raissi
47
1
0
12 Mar 2025
C-3DPO: Constrained Controlled Classification for Direct Preference Optimization
Kavosh Asadi
Julien Han
Xingzi Xu
Dominique Perrault-Joncas
Shoham Sabach
Karim Bouyarmane
Mohammad Ghavamzadeh
34
0
0
22 Feb 2025
Bayesian scaling laws for in-context learning
Aryaman Arora
Dan Jurafsky
Christopher Potts
Noah D. Goodman
29
2
0
21 Oct 2024
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking
Yifan Jiang
Kriti Aggarwal
Tanmay Laud
Kashif Munir
Jay Pujara
Subhabrata Mukherjee
AAML
56
10
0
26 Sep 2024
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits
Tuhin Chakrabarty
Philippe Laban
C. Wu
52
8
0
22 Sep 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Zhichao Wang
Bin Bi
Shiva K. Pentyala
Kiran Ramnath
Sougata Chaudhuri
...
Z. Zhu
Xiang-Bo Mao
S. Asur
Na
Na Cheng
OffRL
42
40
0
23 Jul 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
152
114
0
04 Apr 2024
Direct Preference Optimization with an Offset
Afra Amini
Tim Vieira
Ryan Cotterell
73
55
0
16 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
182
456
0
02 Feb 2024
Understanding Dataset Difficulty with
V
\mathcal{V}
V
-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
167
157
0
16 Oct 2021
1