Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
123
24
0
01 Dec 2023
Sample Efficient Preference Alignment in LLMs via Active Exploration
Viraj Mehta
Vikramjeet Das
Ojash Neopane
Yijia Dai
Ilija Bogunovic
Ilija Bogunovic
Willie Neiswanger
Stefano Ermon
Jeff Schneider
Willie Neiswanger
OffRL
137
9
0
01 Dec 2023
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
Zhiyuan Zhao
Bin Wang
Linke Ouyang
Xiao-wen Dong
Jiaqi Wang
Conghui He
MLLM
VLM
143
135
0
28 Nov 2023
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
Nianwen Si
Hao Zhang
Heyu Chang
Wenlin Zhang
Dan Qu
Weiqiang Zhang
KELM
MU
162
33
0
27 Nov 2023
Justifiable Artificial Intelligence: Engineering Large Language Models for Legal Applications
Sabine Wehnert
AILaw
106
4
0
27 Nov 2023
Universal Jailbreak Backdoors from Poisoned Human Feedback
Javier Rando
Florian Tramèr
121
75
0
24 Nov 2023
Reinforcement Learning from Statistical Feedback: the Journey from AB Testing to ANT Testing
Feiyang Han
Yimin Wei
Zhaofeng Liu
Yanxing Qi
65
1
0
24 Nov 2023
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang
Jian Tao
Jiafei Lyu
Chunjiang Ge
Jiaxin Chen
Qimai Li
Weihan Shen
Xiaolong Zhu
Xiu Li
EGVM
129
109
0
22 Nov 2023
A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift
Will LeVine
Benjamin Pikus
Tony Chen
Sean Hendryx
178
10
0
21 Nov 2023
Behavior Optimized Image Generation
Varun Khurana
Yaman Kumar Singla
J. Subramanian
R. Shah
Changyou Chen
Zhiqiang Xu
Balaji Krishnamurthy
EGVM
59
4
0
18 Nov 2023
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiong Wang
Junlin Wu
Muhao Chen
Yevgeniy Vorobeychik
Chaowei Xiao
AAML
94
15
0
16 Nov 2023
Simulating Opinion Dynamics with Networks of LLM-based Agents
Yun-Shiuan Chuang
Agam Goyal
Nikunj Harlalka
Siddharth Suresh
Robert Hawkins
Sijia Yang
Dhavan Shah
Junjie Hu
Timothy T. Rogers
AI4CE
121
73
0
16 Nov 2023
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation
Yikun Wang
Rui Zheng
Haoming Li
Qi Zhang
Tao Gui
Fei Liu
OffRL
58
4
0
15 Nov 2023
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez
Robert Long
ELM
70
11
0
14 Nov 2023
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
Peng Ding
Jun Kuang
Dan Ma
Xuezhi Cao
Yunsen Xian
Jiajun Chen
Shujian Huang
AAML
94
122
0
14 Nov 2023
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
Pengyu Cheng
Yifan Yang
Jian Li
Yong Dai
Tianhao Hu
Peixin Cao
Nan Du
Xiaolong Li
128
30
0
14 Nov 2023
Understanding Users' Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level
Yoonsu Kim
Jueon Lee
Seoyoung Kim
Jaehyuk Park
Juho Kim
130
45
0
13 Nov 2023
Controlled Text Generation for Black-box Language Models via Score-based Progressive Editor
Sangwon Yu
Changmin Lee
Hojin Lee
Sungroh Yoon
139
0
0
13 Nov 2023
Online Advertisements with LLMs: Opportunities and Challenges
Soheil Feizi
Mohammadtaghi Hajiaghayi
Keivan Rezaei
Suho Shin
OffRL
164
11
0
11 Nov 2023
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Joey Hong
Sergey Levine
Anca Dragan
OffRL
LLMAG
93
29
0
09 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
101
9
0
09 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra
Eve Fleisig
Kyunghyun Cho
Adam Lopez
LRM
68
8
0
08 Nov 2023
Evaluating multiple large language models in pediatric ophthalmology
J. Holmes
Rui Peng
Yiwei Li
Jinyu Hu
Zheng Liu
...
Wei Liu
Hong Wei
Jie Zou
Tianming Liu
Yi Shao
AI4Ed
ELM
LM&MA
49
2
0
07 Nov 2023
Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features
Diogo Cruz
Edoardo Pona
Alex Holness-Tofts
Elias Schmied
Víctor Abia Alonso
Charlie Griffin
B. Cirstea
60
0
0
07 Nov 2023
Benefits and Harms of Large Language Models in Digital Mental Health
Munmun De Choudhury
Sachin R. Pendse
Neha Kumar
LM&MA
AI4MH
86
47
0
07 Nov 2023
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David Wagner
ALM
95
32
0
06 Nov 2023
AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs
Yann Hicke
Anmol Agarwal
Qianou Ma
Paul Denny
AI4Ed
88
24
0
05 Nov 2023
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
Prakash Panangaden
Doina Precup
86
2
0
03 Nov 2023
Leveraging Large Language Models for Collective Decision-Making
Marios Papachristou
Longqi Yang
Chin-Chia Hsu
LLMAG
89
3
0
03 Nov 2023
Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation
Wanyu Du
Yangfeng Ji
64
1
0
02 Nov 2023
The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis
Yuxiang Zhou
Jiazheng Li
Yanzheng Xiang
Hanqi Yan
Lin Gui
Yulan He
93
19
0
01 Nov 2023
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions
Taehyeon Kim
Joonkee Kim
Gihun Lee
Se-Young Yun
100
14
0
01 Nov 2023
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert
Roberto Calandra
ALM
145
36
0
31 Oct 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
79
10
0
31 Oct 2023
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
77
92
0
31 Oct 2023
Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Prakamya Mishra
Zonghai Yao
Shuwei Chen
Beining Wang
Rohan Mittal
Hong-ye Yu
KELM
ALM
HILM
75
7
0
30 Oct 2023
Preventing Language Models From Hiding Their Reasoning
Fabien Roger
Ryan Greenblatt
LRM
120
18
0
27 Oct 2023
Expanding the Set of Pragmatic Considerations in Conversational AI
S. M. Seals
V. Shalin
74
2
0
27 Oct 2023
How well can machine-generated texts be identified and can language models be trained to avoid identification?
Sinclair Schneider
Florian Steuber
João A. G. Schneider
Gabi Dreo Rodosek
DeLMO
33
1
0
25 Oct 2023
A Multilingual Virtual Guide for Self-Attachment Technique
Alicia Jiayun Law
Ruoyu Hu
Lisa Alazraki
Anandha Gopalan
Neophytos Polydorou
A. Edalat
43
3
0
25 Oct 2023
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Wenlin Yao
KELM
176
163
0
24 Oct 2023
AI Alignment and Social Choice: Fundamental Limitations and Policy Implications
Abhilash Mishra
35
28
0
24 Oct 2023
Data-driven Traffic Simulation: A Comprehensive Review
Di Chen
Meixin Zhu
Heng Yang
Xuesong Wang
Yinhai Wang
84
29
0
24 Oct 2023
Self-Guard: Empower the LLM to Safeguard Itself
Zezhong Wang
Fangkai Yang
Lu Wang
Pu Zhao
Hongru Wang
Liang Chen
Qingwei Lin
Kam-Fai Wong
166
35
0
24 Oct 2023
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression
Jiduan Liu
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
Dongyan Zhao
Ran Wang
Rui Yan
68
4
0
24 Oct 2023
Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation
Tianqi Zhong
Quan Wang
Jingxuan Han
Yongdong Zhang
Zhendong Mao
92
9
0
23 Oct 2023
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Marzieh Fadaee
Sara Hooker
ALM
88
19
0
22 Oct 2023
Contrastive Preference Learning: Learning from Human Feedback without RL
Joey Hejna
Rafael Rafailov
Harshit S. Sikchi
Chelsea Finn
S. Niekum
W. B. Knox
Dorsa Sadigh
OffRL
123
55
0
20 Oct 2023
Teaching Language Models to Self-Improve through Interactive Demonstrations
Xiao Yu
Baolin Peng
Michel Galley
Jianfeng Gao
Zhou Yu
LRM
ReLM
104
22
0
20 Oct 2023
Automated Repair of Declarative Software Specifications in the Era of Large Language Models
Md Rashedul Hasan
Jiawei Li
Iftekhar Ahmed
Hamid Bagheri
84
3
0
19 Oct 2023
Previous
1
2
3
...
16
17
18
...
24
25
26
Next