Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
Hummer: Towards Limited Competitive Preference Dataset
Li Jiang
Yusen Wu
Junwu Xiong
Jingqing Ruan
Yichuan Ding
Qingpei Guo
ZuJie Wen
Jun Zhou
Xiaotie Deng
91
7
0
19 May 2024
Sociotechnical Implications of Generative Artificial Intelligence for Information Access
Bhaskar Mitra
Henriette Cramer
Olya Gurevich
123
2
0
19 May 2024
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT
Jace Grandinetti
R. Mcbeth
AI4CE
LRM
LM&MA
84
0
0
17 May 2024
Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities
Hao Zhou
Chengming Hu
Ye Yuan
Yufei Cui
Yili Jin
...
Di Wu
Xue Liu
Charlie Zhang
Xianbin Wang
Jiangchuan Liu
113
79
0
17 May 2024
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
...
Alane Suhr
Saining Xie
Yann LeCun
Yi-An Ma
Sergey Levine
LLMAG
LRM
139
80
0
16 May 2024
Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
Andrea V. Bajcsy
J. F. Fisac
65
7
0
16 May 2024
NIFTY Financial News Headlines Dataset
Raeid Saqur
Ken Kato
Nicholas Vinden
Frank Rudzicz
AIFin
68
1
0
16 May 2024
TFWT: Tabular Feature Weighting with Transformer
Xinhao Zhang
Zaitian Wang
Lu Jiang
Wanfu Gao
Pengfei Wang
Kunpeng Liu
LMTD
75
18
0
14 May 2024
RLHF Workflow: From Reward Modeling to Online RLHF
Hanze Dong
Wei Xiong
Bo Pang
Haoxiang Wang
Han Zhao
Yingbo Zhou
Nan Jiang
Doyen Sahoo
Caiming Xiong
Tong Zhang
OffRL
89
132
0
13 May 2024
OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs
Mihai Masala
Denis C. Ilie-Ablachim
D. Corlatescu
Miruna Zavelca
Marius Leordeanu
Horia Velicu
Marius Popescu
Mihai Dascalu
Traian Rebedea
92
4
0
13 May 2024
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee
Jae Oh Woo
Juree Seok
Parisa Hassanzadeh
Wooseok Jang
...
Hankyu Moon
Wenjun Hu
Yeong-Dae Kwon
Taehee Lee
Seungjai Min
140
2
0
10 May 2024
Truthful Aggregation of LLMs with an Application to Online Advertising
Ermis Soumalias
Michael J. Curry
Sven Seuken
132
14
0
09 May 2024
AffirmativeAI: Towards LGBTQ+ Friendly Audit Frameworks for Large Language Models
Yinru Long
Zilin Ma
Yiyang Mei
Zhaoyuan Su
AI4MH
81
0
0
07 May 2024
Optimizing Language Model's Reasoning Abilities with Weak Supervision
Yongqi Tong
Sizhe Wang
Dawei Li
Yifan Wang
Simeng Han
Zi Lin
Chengsong Huang
Jiaxin Huang
Jingbo Shang
LRM
ReLM
99
10
0
07 May 2024
Reinforcement Learning-Guided Semi-Supervised Learning
Marzi Heidari
Hanping Zhang
Yuhong Guo
OffRL
92
1
0
02 May 2024
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu
Yiming Hao
Manyuan Zhang
Keqiang Sun
Zhaoyang Huang
Guanglu Song
Yu Liu
Hongsheng Li
EGVM
127
25
0
01 May 2024
Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning
Lucas-Andrei Thil
Mirela Popa
Gerasimos Spanakis
LLMAG
37
2
0
01 May 2024
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park
Mingyang Liu
Dingwen Kong
Kaiqing Zhang
Asuman Ozdaglar
149
41
0
30 Apr 2024
Iterative Reasoning Preference Optimization
Richard Yuanzhe Pang
Weizhe Yuan
Kyunghyun Cho
He He
Sainbayar Sukhbaatar
Jason Weston
LRM
158
138
0
30 Apr 2024
Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Mathieu Rita
Florian Strub
Rahma Chaabouni
Paul Michel
Emmanuel Dupoux
Olivier Pietquin
92
10
0
30 Apr 2024
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness
Aaron Jiaxun Li
Satyapriya Krishna
Himabindu Lakkaraju
59
4
0
29 Apr 2024
Performance-Aligned LLMs for Generating Fast Code
Daniel Nichols
Pranav Polasam
Harshitha Menon
Aniruddha Marathe
T. Gamblin
A. Bhatele
88
10
0
29 Apr 2024
A Framework for Real-time Safeguarding the Text Generation of Large Language Model
Ximing Dong
Dayi Lin
Shaowei Wang
Ahmed E. Hassan
131
1
0
29 Apr 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
155
72
0
29 Apr 2024
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman
Noam Rotstein
Roy Ganz
Ron Kimmel
DiffM
135
16
0
28 Apr 2024
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo
Stephen Zhao
Rob Brekelmans
Alireza Makhzani
Roger C. Grosse
89
41
0
26 Apr 2024
When to Trust LLMs: Aligning Confidence with Response Quality
Shuchang Tao
Liuyi Yao
Hanxing Ding
Yuexiang Xie
Qi Cao
Fei Sun
Jinyang Gao
Huawei Shen
Bolin Ding
111
24
0
26 Apr 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
102
6
0
26 Apr 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
85
41
0
25 Apr 2024
Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare
Emre Can Acikgoz
Osman Batur .Ince
Rayene Bench
Arda Anil Boz
.Ilker Kesen
Aykut Erdem
Erkut Erdem
LM&MA
79
10
0
25 Apr 2024
Model Extrapolation Expedites Alignment
Chujie Zheng
Ziqi Wang
Heng Ji
Minlie Huang
Nanyun Peng
MoMe
87
33
0
25 Apr 2024
Don't Say No: Jailbreaking LLM by Suppressing Refusal
Yukai Zhou
Jian Lou
Zhijie Huang
Zhan Qin
Yibei Yang
Wenjie Wang
AAML
116
19
0
25 Apr 2024
A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples
Lihang Pan
Yuxuan Li
Chun Yu
Yuanchun Shi
LLMAG
82
2
0
24 Apr 2024
Aligning LLM Agents by Learning Latent Preference from User Edits
Ge Gao
Alexey Taymanov
Eduardo Salinas
Paul Mineiro
Dipendra Kumar Misra
LLMAG
94
31
0
23 Apr 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar
Anika Singh
Archit Sharma
Rafael Rafailov
Jeff Schneider
Tengyang Xie
Stefano Ermon
Chelsea Finn
Aviral Kumar
110
131
0
22 Apr 2024
Generating Attractive and Authentic Copywriting from Customer Reviews
Yu-Xiang Lin
Wei-Yun Ma
81
2
0
22 Apr 2024
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus
Arman Zharmagambetov
Chuan Guo
Brandon Amos
Yuandong Tian
AAML
142
67
0
21 Apr 2024
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Zhaofeng Wu
Ananth Balashankar
Yoon Kim
Jacob Eisenstein
Ahmad Beirami
117
15
0
18 Apr 2024
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Chandeepa Dissanayake
Lahiru Lowe
Sachith Gunasekara
Yasiru Ratnayake
MoE
ALM
64
2
0
18 Apr 2024
RAM: Towards an Ever-Improving Memory System by Learning from Communications
Jiaqi Li
Xiaobo Wang
Wentao Ding
Zihao Wang
Yipeng Kang
Zixia Jia
Zilong Zheng
103
3
0
18 Apr 2024
Stepwise Alignment for Constrained Language Model Policy Optimization
Akifumi Wachi
Thien Q. Tran
Rei Sato
Takumi Tanabe
Yohei Akimoto
85
10
0
17 Apr 2024
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu
Wei Fu
Jiaxuan Gao
Wenjie Ye
Weiling Liu
Zhiyu Mei
Guangju Wang
Chao Yu
Yi Wu
162
165
0
16 Apr 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
118
40
0
16 Apr 2024
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
Ruoxi Cheng
Haoxuan Ma
Shuirong Cao
Jiaqi Li
Aihua Pei
Zhiqiang Wang
Pengliang Ji
Haoyu Wang
Jiaqi Huo
AI4CE
90
9
0
15 Apr 2024
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection
Yuxi Li
Yi Liu
Gelei Deng
Ying Zhang
Wenjia Song
Ling Shi
Kailong Wang
Yuekang Li
Yang Liu
Haoyu Wang
127
22
0
15 Apr 2024
Exploring Text-to-Motion Generation with Human Preference
Jenny Sheng
Matthieu Lin
Andrew Zhao
Kevin Pruvost
Yu-Hui Wen
Yangguang Li
Gao Huang
Yong-Jin Liu
VGen
122
2
0
15 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
117
10
0
13 Apr 2024
CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning
Zukang Yang
Zixuan Zhu
Xuan Zhu
RALM
128
1
0
13 Apr 2024
Hindsight PRIORs for Reward Learning from Human Preferences
Mudit Verma
Katherine Metcalf
91
6
0
12 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
91
38
0
12 Apr 2024
Previous
1
2
3
...
11
12
13
...
24
25
26
Next