Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18290
Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Preference Optimization: Your Language Model is Secretly a Reward Model"
50 / 2,637 papers shown
Title
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MLLM
50
42
0
14 Mar 2024
Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models
Laura Fernández-Becerra
Miguel Ángel González Santamarta
Ángel Manuel Guerrero Higueras
Francisco J. Rodríguez-Lera
Vicente Matellán Olivera
41
0
0
14 Mar 2024
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Zhiqing Sun
Longhui Yu
Yikang Shen
Weiyang Liu
Yiming Yang
Sean Welleck
Chuang Gan
36
55
0
14 Mar 2024
TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Semantic Tasks
Viktor Moskvoretskii
Ekaterina Neminova
Alina Lobanova
Alexander Panchenko
Irina Nikishina
45
6
0
14 Mar 2024
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences
Martin Weyssow
Aton Kamanda
H. Sahraoui
ALM
72
33
0
14 Mar 2024
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Renjie Pi
Tianyang Han
Wei Xiong
Jipeng Zhang
Runtao Liu
Rui Pan
Tong Zhang
MLLM
55
34
0
13 Mar 2024
SOTOPIA-
π
π
π
: Interactive Learning of Socially Intelligent Language Agents
Ruiyi Wang
Haofei Yu
W. Zhang
Zhengyang Qi
Maarten Sap
Graham Neubig
Yonatan Bisk
Hao Zhu
LLMAG
51
38
0
13 Mar 2024
Human Alignment of Large Language Models through Online Preference Optimisation
Daniele Calandriello
Daniel Guo
Rémi Munos
Mark Rowland
Yunhao Tang
...
Michal Valko
Tianqi Liu
Rishabh Joshi
Zeyu Zheng
Bilal Piot
52
60
0
13 Mar 2024
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
108
42
0
13 Mar 2024
Tastle: Distract Large Language Models for Automatic Jailbreak Attack
Zeguan Xiao
Yan Yang
Guanhua Chen
Yun-Nung Chen
AAML
50
18
0
13 Mar 2024
HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback
Ang Li
Qiugen Xiao
Peng Cao
Jian Tang
Yi Yuan
...
Weidong Guo
Yukang Gan
Jeffrey Xu Yu
D. Wang
Ying Shan
VLM
ALM
44
10
0
13 Mar 2024
Learning to Watermark LLM-generated Text via Reinforcement Learning
Xiaojun Xu
Yuanshun Yao
Yang Liu
31
10
0
13 Mar 2024
MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension
Xingyu Lu
He Cao
Zijing Liu
Shengyuan Bai
Leqing Chen
Yuan Yao
Hai-Tao Zheng
Yu-Feng Li
HILM
26
7
0
13 Mar 2024
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Minbin Huang
Yanxin Long
Xinchi Deng
Ruihang Chu
Jiangfeng Xiong
Xiaodan Liang
Hong Cheng
Qinglin Lu
Wei Liu
MLLM
EGVM
65
8
0
13 Mar 2024
Large Language Models are Contrastive Reasoners
Liang Yao
ReLM
ELM
LRM
50
2
0
13 Mar 2024
Authorship Style Transfer with Policy Optimization
Shuai Liu
Shantanu Agarwal
Jonathan May
42
6
0
12 Mar 2024
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu
Fangyun Wei
Yanye Lu
MLLM
VLM
57
18
0
12 Mar 2024
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Wei Shen
Xiaoying Zhang
Yuanshun Yao
Rui Zheng
Hongyi Guo
Yang Liu
ALM
40
12
0
12 Mar 2024
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong
Noah Lee
James Thorne
OSLM
44
213
0
12 Mar 2024
Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences
Pulkit Pattnaik
Rishabh Maheshwary
Kelechi Ogueji
Vikas Yadav
Sathwik Tejaswi Madhusudhan
42
18
0
12 Mar 2024
(
N
,
K
)
\mathbf{(N,K)}
(
N
,
K
)
-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Yufeng Zhang
Liyu Chen
Boyi Liu
Yingxiang Yang
Qiwen Cui
Yunzhe Tao
Hongxia Yang
122
0
0
11 Mar 2024
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Jialu Li
Jaemin Cho
Yi-Lin Sung
Jaehong Yoon
Mohit Bansal
MoMe
DiffM
52
8
0
11 Mar 2024
From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification
Fei Wang
Chao Shang
Sarthak Jain
Shuai Wang
Qiang Ning
Bonan Min
Vittorio Castelli
Yassine Benajiba
Dan Roth
ALM
27
8
0
10 Mar 2024
Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond
Wenpin Tang
48
13
0
10 Mar 2024
Bayesian Preference Elicitation with Language Models
Kunal Handa
Yarin Gal
Ellie Pavlick
Noah D. Goodman
Jacob Andreas
Alex Tamkin
Belinda Z. Li
42
12
0
08 Mar 2024
Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation
Xiaoying Zhang
Jean-François Ton
Wei Shen
Hongning Wang
Yang Liu
39
14
0
08 Mar 2024
A Survey on Human-AI Teaming with Large Pre-Trained Models
Vanshika Vats
Marzia Binta Nizam
Minghao Liu
Ziyuan Wang
Richard Ho
...
Celeste Shen
Rachel Shen
Nafisa Hussain
Kesav Ravichandran
James Davis
LM&MA
65
8
0
07 Mar 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
150
512
0
07 Mar 2024
Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLM
LRM
39
71
0
07 Mar 2024
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye
Zitong Yu
Rui Shao
Xinyu Xie
Philip Torr
Xiaochun Cao
MLLM
58
24
0
07 Mar 2024
Explaining Bayesian Optimization by Shapley Values Facilitates Human-AI Collaboration
Julian Rodemann
Federico Croppi
Philipp Arens
Yusuf Sale
J. Herbinger
B. Bischl
Eyke Hüllermeier
Thomas Augustin
Conor J. Walsh
Giuseppe Casalicchio
46
7
0
07 Mar 2024
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
Yu Zhu
Chuxiong Sun
Wenfei Yang
Wenqiang Wei
Simin Niu
...
Zhiyu Li
Shifeng Zhang
Feiyu Xiong
Jie Hu
Mingchuan Yang
42
3
0
07 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
50
16
0
07 Mar 2024
Preference optimization of protein language models as a multi-objective binder design paradigm
Pouria A. Mistani
Venkatesh Mysore
45
6
0
07 Mar 2024
SaulLM-7B: A pioneering Large Language Model for Law
Pierre Colombo
T. Pires
Malik Boudiaf
Dominic Culver
Rui Melo
...
Andre F. T. Martins
Fabrizio Esposito
Vera Lúcia Raposo
Sofia Morgado
Michael Desa
ELM
AILaw
54
66
0
06 Mar 2024
Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization
Shitong Duan
Xiaoyuan Yi
Peng Zhang
Tun Lu
Xing Xie
Ning Gu
40
4
0
06 Mar 2024
Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations
Max Lamparth
Anthony Corso
Jacob Ganz
O. Mastro
Jacquelyn G. Schneider
Harold Trinkunas
54
7
0
06 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
59
147
0
05 Mar 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
147
1,089
0
05 Mar 2024
CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following
Kaiyan Zhang
Jianyu Wang
Ermo Hua
Biqing Qi
Ning Ding
Bowen Zhou
SyDa
43
20
0
05 Mar 2024
"In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning
Chuanqi Cheng
Quan Tu
Wei Wu
Shuo Shang
Cunli Mao
Zhengtao Yu
Rui Yan
49
2
0
05 Mar 2024
Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges
Bosheng Ding
Chengwei Qin
Ruochen Zhao
Tianze Luo
Xinze Li
Guizhen Chen
Wenhan Xia
Junjie Hu
Anh Tuan Luu
Shafiq Joty
41
19
0
05 Mar 2024
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering
Sungho Ko
Hyunjin Cho
Hyungjoo Chae
Jinyoung Yeo
Dongha Lee
RALM
HILM
24
7
0
05 Mar 2024
CURATRON: Complete Robust Preference Data for Robust Alignment of Large Language Models
S. Nguyen
Uma-Naresh Niranjan
Theja Tulabandhula
46
0
0
05 Mar 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
57
76
0
05 Mar 2024
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
Aly M. Kassem
Omar Mahmoud
Niloofar Mireshghallah
Hyunwoo J. Kim
Yulia Tsvetkov
Yejin Choi
Sherif Saad
Santu Rana
55
19
0
05 Mar 2024
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF
Chen Zheng
Ke Sun
Hang Wu
Chenguang Xi
Xun Zhou
60
12
0
04 Mar 2024
Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
Yifan Song
Da Yin
Xiang Yue
Jie Huang
Sujian Li
Bill Yuchen Lin
45
68
0
04 Mar 2024
Enhancing LLM Safety via Constrained Direct Preference Optimization
Zixuan Liu
Xiaolin Sun
Zizhan Zheng
48
20
0
04 Mar 2024
Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models
Dimitris Papadimitriou
Daniel S. Brown
53
1
0
04 Mar 2024
Previous
1
2
3
...
43
44
45
...
51
52
53
Next