Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
Domain Adaptation of Echocardiography Segmentation Via Reinforcement Learning
Arnaud Judge
Thierry Judge
Nicolas Duchateau
Roman A. Sandler
Joseph Z. Sokol
Olivier Bernard
Pierre-Marc Jodoin
OOD
59
0
0
25 Jun 2024
Aligning Diffusion Models with Noise-Conditioned Perception
Alexander Gambashidze
Anton Kulikov
Yuriy Sosnin
Ilya Makarov
116
5
0
25 Jun 2024
Multi-property Steering of Large Language Models with Dynamic Activation Composition
Daniel Scalena
Gabriele Sarti
Malvina Nissim
KELM
LLMSV
AI4CE
89
15
0
25 Jun 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
151
77
0
24 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
134
23
0
24 Jun 2024
Towards a Science Exocortex
Kevin G. Yager
112
2
0
24 Jun 2024
Towards Comprehensive Preference Data Collection for Reward Modeling
Yulan Hu
Qingyang Li
Sheng Ouyang
Ge Chen
Kaihui Chen
Lijun Mei
Xucheng Ye
Fuzheng Zhang
Yong Liu
SyDa
128
4
0
24 Jun 2024
Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?
Yuu Jinnai
98
5
0
24 Jun 2024
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou
Abdalgader Abubaker
Hakim Hacid
LRM
120
5
0
23 Jun 2024
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Mucong Ding
Souradip Chakraborty
Vibhu Agrawal
Zora Che
Alec Koppel
Mengdi Wang
Amrit Singh Bedi
Furong Huang
85
13
0
21 Jun 2024
A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
Nishant Balepur
Matthew Shu
Alexander Hoyle
Alison Robey
Shi Feng
Seraphina Goldfarb-Tarrant
Jordan Boyd-Graber
109
4
0
21 Jun 2024
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue
Huifang Du
Shuqin Li
Minghao Wu
Xuejing Feng
Yuan-Fang Li
Haofen Wang
OffRL
111
2
0
20 Jun 2024
Timo: Towards Better Temporal Reasoning for Language Models
Zhaochen Su
Jun Zhang
Tong Zhu
Xiaoye Qu
Juntao Li
Min Zhang
Yu Cheng
LRM
92
23
0
20 Jun 2024
Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective
Muhammad Aneeq uz Zaman
Mathieu Laurière
Alec Koppel
Tamer Basar
96
3
0
20 Jun 2024
What Teaches Robots to Walk, Teaches Them to Trade too -- Regime Adaptive Execution using Informed Data and LLMs
Raeid Saqur
76
3
0
20 Jun 2024
Global Human-guided Counterfactual Explanations for Molecular Properties via Reinforcement Learning
Danqing Wang
Antonis Antoniades
Kha-Dinh Luong
Edwin Zhang
Mert Kosan
Jiachen Li
Ambuj Singh
William Yang Wang
Lei Li
AI4CE
77
0
0
19 Jun 2024
Towards Minimal Targeted Updates of Language Models with Targeted Negative Training
Lily H. Zhang
Rajesh Ranganath
Arya Tafvizi
117
1
0
19 Jun 2024
Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback
Zhirui Chen
Vincent Y. F. Tan
OffRL
95
1
0
18 Jun 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li
Yiming Wang
Fernanda Viégas
Martin Wattenberg
77
7
0
17 Jun 2024
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
134
9
0
17 Jun 2024
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Rong Bao
Rui Zheng
Shihan Dou
Xiao Wang
Enyu Zhou
Bo Wang
Qi Zhang
Liang Ding
Dacheng Tao
ALM
143
0
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
199
33
0
17 Jun 2024
Effective Generative AI: The Human-Algorithm Centaur
S. Saghafian
Lihi Idan
80
7
0
16 Jun 2024
Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis
Yuping Lin
Pengfei He
Han Xu
Yue Xing
Makoto Yamada
Hui Liu
Jiliang Tang
84
17
0
16 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy T. Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
105
6
0
15 Jun 2024
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
122
62
0
14 Jun 2024
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
Hao Bai
Yifei Zhou
Mert Cemri
Jiayi Pan
Alane Suhr
Sergey Levine
Aviral Kumar
OffRL
111
65
0
14 Jun 2024
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Luckeciano C. Melo
P. Tigas
Alessandro Abate
Yarin Gal
122
11
0
14 Jun 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Miaosen Zhang
Yixuan Wei
Zhen Xing
Yifei Ma
Zuxuan Wu
...
Zheng Zhang
Qi Dai
Chong Luo
Xin Geng
Baining Guo
VLM
84
1
0
13 Jun 2024
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models
Sarah Ball
Frauke Kreuter
Nina Rimsky
87
18
0
13 Jun 2024
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Hamish Ivison
Yizhong Wang
Jiacheng Liu
Zeqiu Wu
Valentina Pyatkin
Nathan Lambert
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
108
64
0
13 Jun 2024
ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Xu Zhang
Xunjian Yin
Xiaojun Wan
77
3
0
13 Jun 2024
Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Duanyu Feng
Bowen Qin
Chen Huang
Youcheng Huang
Zheng Zhang
Wenqiang Lei
79
3
0
12 Jun 2024
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Shang Wang
Tianqing Zhu
Bo Liu
Ming Ding
Xu Guo
Dayong Ye
Wanlei Zhou
Philip S. Yu
PILM
131
20
0
12 Jun 2024
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
108
3
0
12 Jun 2024
OPTune: Efficient Online Preference Tuning
Lichang Chen
Jiuhai Chen
Chenxi Liu
John Kirchenbauer
Davit Soselia
Chen Zhu
Tom Goldstein
Dinesh Manocha
Heng Huang
70
5
0
11 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
133
2
0
11 Jun 2024
Multi-objective Reinforcement learning from AI Feedback
Marcus Williams
95
1
0
11 Jun 2024
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Sijia Chen
Yibo Wang
Yi-Feng Wu
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Lijun Zhang
LLMAG
LRM
121
18
0
11 Jun 2024
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan
Yibo Miao
J. Li
Yipin Zhang
Jian Xie
Zhijie Deng
Dong Yan
109
13
0
11 Jun 2024
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong
Sayak Paul
Noah Lee
Kashif Rasul
James Thorne
Jongheon Jeong
104
18
0
10 Jun 2024
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity
Calarina Muslimani
Bram Grooten
Deepak Ranganatha Sastry Mamillapalli
Mykola Pechenizkiy
Decebal Constantin Mocanu
Matthew E. Taylor
105
0
0
10 Jun 2024
Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing
Biqing Qi
Pengfei Li
Fangyuan Li
Junqi Gao
Kaiyan Zhang
Bowen Zhou
91
18
0
08 Jun 2024
A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques
Megh Thakkar
Quentin Fournier
Matthew D Riemer
Pin-Yu Chen
Payel Das
Payel Das
Sarath Chandar
ALM
87
11
0
07 Jun 2024
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
Yibo Yang
Xiaojie Li
Zhongzhu Zhou
Shuaiwen Leon Song
Jianlong Wu
Liqiang Nie
Guohao Li
110
14
0
07 Jun 2024
Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF
Yuan Sun
Navid Salami Pargoo
Peter J. Jin
Jorge Ortiz
80
21
0
06 Jun 2024
Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs
Shang Zhou
Feng Yao
Chengyu Dong
Zihan Wang
Jingbo Shang
77
2
0
06 Jun 2024
Open-Endedness is Essential for Artificial Superhuman Intelligence
Edward Hughes
Michael Dennis
Jack Parker-Holder
Feryal M. P. Behbahani
Aditi Mavalankar
Yuge Shi
Tom Schaul
Tim Rocktaschel
LRM
106
33
0
06 Jun 2024
Aligning Agents like Large Language Models
Adam Jelley
Yuhan Cao
Dave Bignell
Sam Devlin
Tabish Rashid
LM&Ro
112
1
0
06 Jun 2024
Prototypical Reward Network for Data-Efficient RLHF
Jinghan Zhang
Xiting Wang
Yiqiao Jin
Changyu Chen
Xinhao Zhang
Kunpeng Liu
ALM
91
22
0
06 Jun 2024
Previous
1
2
3
...
9
10
11
...
24
25
26
Next