Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18290
Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Preference Optimization: Your Language Model is Secretly a Reward Model"
50 / 2,637 papers shown
Title
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
Prakash Panangaden
Doina Precup
31
2
0
03 Nov 2023
ChipNeMo: Domain-Adapted LLMs for Chip Design
Mingjie Liu
Teodor-Dumitru Ene
Robert M. Kirby
Chris Cheng
N. Pinckney
...
Pratik P Suthar
Varun Tej
Walker J. Turner
Kaizhe Xu
Haoxin Ren
53
146
0
31 Oct 2023
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert
Roberto Calandra
ALM
29
32
0
31 Oct 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
18
7
0
31 Oct 2023
Learning From Mistakes Makes LLM Better Reasoner
Shengnan An
Zexiong Ma
Zeqi Lin
Nanning Zheng
Jian-Guang Lou
Weizhu Chen
LRM
32
75
0
31 Oct 2023
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
31
83
0
31 Oct 2023
The Expressibility of Polynomial based Attention Scheme
Zhao Song
Guangyi Xu
Junze Yin
39
5
0
30 Oct 2023
Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Prakamya Mishra
Zonghai Yao
Shuwei Chen
Beining Wang
Rohan Mittal
Hong-ye Yu
KELM
ALM
HILM
31
7
0
30 Oct 2023
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Hailin Chen
Amrita Saha
Steven C. H. Hoi
Chenyu You
39
6
0
28 Oct 2023
Fine-Tuning Language Models Using Formal Methods Feedback
Yunhao Yang
N. Bhatt
Tyler Ingebrand
William Ward
Steven Carr
Zhangyang Wang
Ufuk Topcu
34
9
0
27 Oct 2023
Looping in the Human Collaborative and Explainable Bayesian Optimization
Masaki Adachi
Brady Planden
David A. Howey
Michael A. Osborne
Sebastian Orbell
Natalia Ares
Krikamol Maundet
Siu Lun Chau
28
14
0
26 Oct 2023
Controlled Decoding from Language Models
Sidharth Mudgal
Jong Lee
H. Ganapathy
Yaguang Li
Tao Wang
...
Michael Collins
Trevor Strohman
Jilin Chen
Alex Beutel
Ahmad Beirami
36
70
0
25 Oct 2023
Conditionally Combining Robot Skills using Large Language Models
K.R. Zentner
Ryan Julian
Brian Ichter
Gaurav Sukhatme
31
1
0
25 Oct 2023
Zephyr: Direct Distillation of LM Alignment
Lewis Tunstall
E. Beeching
Nathan Lambert
Nazneen Rajani
Kashif Rasul
...
Nathan Habib
Nathan Sarrazin
Omar Sanseviero
Alexander M. Rush
Thomas Wolf
ALM
43
374
0
25 Oct 2023
SuperHF: Supervised Iterative Learning from Human Feedback
Gabriel Mukobi
Peter Chatain
Su Fong
Robert Windesheim
Gitta Kutyniok
Kush S. Bhatia
Silas Alberti
ALM
44
6
0
25 Oct 2023
BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?
Xingmeng Zhao
Tongnian Wang
Sheri Osborn
Anthony Rios
20
4
0
25 Oct 2023
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Jixiang Hong
Quan Tu
C. Chen
Xing Gao
Ji Zhang
Rui Yan
ALM
34
11
0
25 Oct 2023
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David Evans
Shruti Tople
Robert West
KELM
LLMAG
29
20
0
24 Oct 2023
COPR: Continual Learning Human Preference through Optimal Policy Regularization
Han Zhang
Lin Gui
Yuanzhao Zhai
Hui Wang
Yu Lei
Ruifeng Xu
CLL
51
0
0
24 Oct 2023
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Marzieh Fadaee
Sara Hooker
ALM
33
18
0
22 Oct 2023
Vision Language Models in Autonomous Driving: A Survey and Outlook
Xingcheng Zhou
Mingyu Liu
Ekim Yurtsever
B. L. Žagar
Walter Zimmer
Hu Cao
Alois C. Knoll
VLM
44
39
0
22 Oct 2023
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
57
16
0
22 Oct 2023
Contrastive Preference Learning: Learning from Human Feedback without RL
Joey Hejna
Rafael Rafailov
Harshit S. Sikchi
Chelsea Finn
S. Niekum
W. B. Knox
Dorsa Sadigh
OffRL
27
50
0
20 Oct 2023
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Eric Mitchell
Rafael Rafailov
Archit Sharma
Chelsea Finn
Christopher D. Manning
ALM
41
53
0
19 Oct 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
38
296
0
19 Oct 2023
Preference Optimization for Molecular Language Models
Ryan Park
Ryan Theisen
Navriti Sahni
Marcel Patek
Anna Cichoñska
Rayees Rahman
25
6
0
18 Oct 2023
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
49
544
0
18 Oct 2023
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
Rui Zheng
Wei Shen
Yuan Hua
Wenbin Lai
Shihan Dou
...
Xiao Wang
Haoran Huang
Tao Gui
Qi Zhang
Xuanjing Huang
56
14
0
18 Oct 2023
Emptying the Ocean with a Spoon: Should We Edit Models?
Yuval Pinter
Michael Elhadad
KELM
27
26
0
18 Oct 2023
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting
Guande He
Peng Cui
Jianfei Chen
Wenbo Hu
Jun Zhu
50
11
0
18 Oct 2023
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao
John Dang
Aditya Grover
30
29
0
17 Oct 2023
Compositional preference models for aligning LMs
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
29
16
0
17 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
27
52
0
16 Oct 2023
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
Kai Chen
Chunwei Wang
Kuo Yang
Jianhua Han
Lanqing Hong
...
Zhenguo Li
Dit-Yan Yeung
Lifeng Shang
Xin Jiang
Qun Liu
52
33
0
16 Oct 2023
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
Qichen Ye
Junling Liu
Dading Chong
Peilin Zhou
Yining Hua
...
Meng Cao
Ziming Wang
Xuxin Cheng
Andrew Liu
Zhenhua Guo
AI4MH
LM&MA
ELM
35
20
0
13 Oct 2023
Understanding and Controlling a Maze-Solving Policy Network
Ulisse Mini
Peli Grietzer
Mrinank Sharma
Austin Meek
M. MacDiarmid
Alexander Matt Turner
14
15
0
12 Oct 2023
Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Ziran Wang
38
80
0
12 Oct 2023
An Information Bottleneck Characterization of the Understanding-Workload Tradeoff
Lindsay M. Sanneman
Mycal Tucker
Julie A. Shah
37
2
0
11 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
46
170
0
11 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
21
42
0
11 Oct 2023
KwaiYiiMath: Technical Report
Jia-Yi Fu
Lei Lin
Xiaoyang Gao
Pengli Liu
Zhengzong Chen
...
Zijia Lin
Fuzheng Zhang
Zhongyuan Wang
Di Zhang
Kun Gai
LRM
ReLM
RALM
51
2
0
11 Oct 2023
Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
Yuchong Sun
Che Liu
Kun Zhou
Jinwen Huang
Ruihua Song
Xin Zhao
Fuzheng Zhang
Di Zhang
Kun Gai
LRM
20
8
0
11 Oct 2023
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
115
125
0
10 Oct 2023
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
40
9
0
10 Oct 2023
DockGame: Cooperative Games for Multimeric Rigid Protein Docking
Vignesh Ram Somnath
Pier Giuseppe Sessa
María Rodríguez Martínez
Andreas Krause
37
2
0
09 Oct 2023
Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond
Hao Sun
OffRL
34
21
0
09 Oct 2023
SALMON: Self-Alignment with Instructable Reward Models
Zhiqing Sun
Songlin Yang
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALM
SyDa
41
35
0
09 Oct 2023
Improving Summarization with Human Edits
Zonghai Yao
Benjamin J Schloss
Sai P. Selvaraj
32
3
0
09 Oct 2023
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Yi Dong
Zhilin Wang
Makesh Narsimhan Sreedhar
Xianchao Wu
Oleksii Kuchaiev
ALM
LLMSV
42
65
0
09 Oct 2023
Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages
Shih-Cheng Huang
Pin-Zu Li
Yu-Chi Hsu
Kuang-Ming Chen
Yu Tung Lin
Shih-Kai Hsiao
Richard Tzong-Han Tsai
Hung-yi Lee
MoMe
34
14
0
07 Oct 2023
Previous
1
2
3
...
50
51
52
53
Next