ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.08593
  4. Cited By
Fine-Tuning Language Models from Human Preferences
v1v2 (latest)

Fine-Tuning Language Models from Human Preferences

18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
    ALM
ArXiv (abs)PDFHTML

Papers citing "Fine-Tuning Language Models from Human Preferences"

50 / 1,265 papers shown
Title
Aligning Crowd Feedback via Distributional Preference Reward Modeling
Aligning Crowd Feedback via Distributional Preference Reward Modeling
Dexun Li
Cong Zhang
Kuicai Dong
Derrick-Goh-Xin Deik
Ruiming Tang
Yong Liu
97
17
0
15 Feb 2024
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic
  Reward Modeling
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling
Yuchun Miao
Sen Zhang
Liang Ding
Rong Bao
Lefei Zhang
Dacheng Tao
96
21
0
14 Feb 2024
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
  Diverse Human Preferences
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Souradip Chakraborty
Jiahao Qiu
Hui Yuan
Alec Koppel
Furong Huang
Dinesh Manocha
Amrit Singh Bedi
Mengdi Wang
ALM
98
60
0
14 Feb 2024
Reinforcement Learning from Human Feedback with Active Queries
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
102
19
0
14 Feb 2024
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and
  Local Refinements
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alex Havrilla
Sharath Raparthy
Christoforus Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Roberta Railneau
ReLMLRM
92
65
0
13 Feb 2024
Measuring and Controlling Instruction (In)Stability in Language Model
  Dialogs
Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Kenneth Li
Tianle Liu
Naomi Bashkansky
David Bau
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
96
12
0
13 Feb 2024
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward
  Finetuning of Diffusion Models
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng
Qifei Wang
Wei Wei
Matthias Grundmann
Tingbo Hou
EGVM
84
21
0
13 Feb 2024
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang
Tianqi Chen
Mingyuan Zhou
EGVM
126
30
0
13 Feb 2024
Active Preference Learning for Large Language Models
Active Preference Learning for Large Language Models
William Muldrew
Peter Hayes
Mingtian Zhang
David Barber
86
24
0
12 Feb 2024
Large Language Models as Agents in Two-Player Games
Large Language Models as Agents in Two-Player Games
Yang Liu
Peng Sun
Hang Li
LLMAG
73
4
0
12 Feb 2024
Refined Direct Preference Optimization with Synthetic Data for
  Behavioral Alignment of LLMs
Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs
Víctor Gallego
SyDa
55
7
0
12 Feb 2024
Policy Improvement using Language Feedback Models
Policy Improvement using Language Feedback Models
Victor Zhong
Dipendra Kumar Misra
Xingdi Yuan
Marc-Alexandre Côté
84
11
0
12 Feb 2024
AI-Augmented Predictions: LLM Assistants Improve Human Forecasting
  Accuracy
AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy
P. Schoenegger
Peter S. Park
Ezra Karger
P. Tetlock
106
18
0
12 Feb 2024
Mercury: A Code Efficiency Benchmark for Code Large Language Models
Mercury: A Code Efficiency Benchmark for Code Large Language Models
Mingzhe Du
Anh Tuan Luu
Bin Ji
Qian Liu
See-Kiong Ng
ALMELMOffRL
91
13
0
12 Feb 2024
Differentially Private Zeroth-Order Methods for Scalable Large Language
  Model Finetuning
Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning
Zhicheng Liu
Jian Lou
Wenxuan Bao
Yihan Hu
Baochun Li
Zhan Qin
K. Ren
122
10
0
12 Feb 2024
Show Me How It's Done: The Role of Explanations in Fine-Tuning Language
  Models
Show Me How It's Done: The Role of Explanations in Fine-Tuning Language Models
Mohamad Ballout
U. Krumnack
Gunther Heidemann
Kai-Uwe Kuehnberger
LRM
98
4
0
12 Feb 2024
ODIN: Disentangled Reward Mitigates Hacking in RLHF
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen
Chen Zhu
Davit Soselia
Jiuhai Chen
Dinesh Manocha
Tom Goldstein
Heng-Chiao Huang
Mohammad Shoeybi
Bryan Catanzaro
AAML
116
66
0
11 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General
  Preference Model
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
85
15
0
11 Feb 2024
LiFi: Lightweight Controlled Text Generation with Fine-Grained Control
  Codes
LiFi: Lightweight Controlled Text Generation with Fine-Grained Control Codes
Chufan Shi
Deng Cai
Yujiu Yang
160
3
0
10 Feb 2024
Corruption Robust Offline Reinforcement Learning with Human Feedback
Corruption Robust Offline Reinforcement Learning with Human Feedback
Debmalya Mandal
Andi Nika
Parameswaran Kamalaruban
Adish Singla
Goran Radanović
OffRL
93
11
0
09 Feb 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLMLRM
112
137
0
09 Feb 2024
Entropy-Regularized Token-Level Policy Optimization for Language Agent
  Reinforcement
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement
Muning Wen
Junwei Liao
Cheng Deng
Jun Wang
Weinan Zhang
Ying Wen
75
3
0
09 Feb 2024
Noise Contrastive Alignment of Language Models with Explicit Rewards
Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen
Guande He
Lifan Yuan
Ganqu Cui
Hang Su
Jun Zhu
110
56
0
08 Feb 2024
Implicit Diffusion: Efficient Optimization through Stochastic Sampling
Implicit Diffusion: Efficient Optimization through Stochastic Sampling
Pierre Marion
Anna Korba
Peter Bartlett
Mathieu Blondel
Valentin De Bortoli
Arnaud Doucet
Felipe Llinares-López
Courtney Paquette
Quentin Berthet
154
15
0
08 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank
  Modifications
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
151
118
0
07 Feb 2024
Pedagogical Alignment of Large Language Models
Pedagogical Alignment of Large Language Models
Shashank Sonkar
Kangqi Ni
Sapana Chaudhary
Richard G. Baraniuk
AI4Ed
44
9
0
07 Feb 2024
Direct Language Model Alignment from Online AI Feedback
Direct Language Model Alignment from Online AI Feedback
Shangmin Guo
Biao Zhang
Tianlin Liu
Tianqi Liu
Misha Khalman
...
Thomas Mesnard
Yao-Min Zhao
Bilal Piot
Johan Ferret
Mathieu Blondel
ALM
105
160
0
07 Feb 2024
MusicRL: Aligning Music Generation to Human Preferences
MusicRL: Aligning Music Generation to Human Preferences
Geoffrey Cideron
Sertan Girgin
Mauro Verzetti
Damien Vincent
Matej Kastelic
...
Olivier Pietquin
Matthieu Geist
Léonard Hussenot
Neil Zeghidour
A. Agostinelli
87
22
0
06 Feb 2024
Harnessing the Plug-and-Play Controller by Prompting
Harnessing the Plug-and-Play Controller by Prompting
Hao Wang
Lei Sha
66
4
0
06 Feb 2024
Personalized Language Modeling from Personalized Human Feedback
Personalized Language Modeling from Personalized Human Feedback
Xinyu Li
Zachary C. Lipton
Liu Leqi
ALM
132
59
0
06 Feb 2024
Psychological Assessments with Large Language Models: A Privacy-Focused
  and Cost-Effective Approach
Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach
Sergi Blanco-Cuaresma
55
1
0
05 Feb 2024
MobilityGPT: Enhanced Human Mobility Modeling with a GPT model
MobilityGPT: Enhanced Human Mobility Modeling with a GPT model
Ammar Haydari
Dongjie Chen
Zhengfeng Lai
Michael Zhang
Chen-Nee Chuah
145
10
0
05 Feb 2024
Best Practices for Text Annotation with Large Language Models
Best Practices for Text Annotation with Large Language Models
Petter Törnberg
103
22
0
05 Feb 2024
Preference-Conditioned Language-Guided Abstraction
Preference-Conditioned Language-Guided Abstraction
Andi Peng
Andreea Bobu
Belinda Z. Li
T. Sumers
Ilia Sucholutsky
Nishanth Kumar
Thomas Griffiths
Julie A. Shah
75
13
0
05 Feb 2024
Decoding-time Realignment of Language Models
Decoding-time Realignment of Language Models
Tianlin Liu
Shangmin Guo
Leonardo Bianco
Daniele Calandriello
Quentin Berthet
Felipe Llinares-López
Jessica Hoffmann
Lucas Dixon
Michal Valko
Mathieu Blondel
AI4CE
119
46
0
05 Feb 2024
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural
  language generation from feedback
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
Gaurav Pandey
Yatin Nandwani
Tahira Naseem
Mayank Mishra
Guangxuan Xu
Dinesh Raghu
Sachindra Joshi
Asim Munawar
Ramón Fernández Astudillo
BDL
66
4
0
04 Feb 2024
Aligner: Efficient Alignment by Learning to Correct
Aligner: Efficient Alignment by Learning to Correct
Jiaming Ji
Boyuan Chen
Hantao Lou
Chongye Guo
Borong Zhang
Xuehai Pan
Juntao Dai
Tianyi Qiu
Yaodong Yang
148
40
0
04 Feb 2024
Jailbreaking Attack against Multimodal Large Language Model
Jailbreaking Attack against Multimodal Large Language Model
Zhenxing Niu
Haoxuan Ji
Xinbo Gao
Gang Hua
Rong Jin
97
76
0
04 Feb 2024
Rethinking the Role of Proxy Rewards in Language Model Alignment
Rethinking the Role of Proxy Rewards in Language Model Alignment
Sungdong Kim
Minjoon Seo
SyDaALM
58
2
0
02 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
306
570
0
02 Feb 2024
Efficient Prompt Caching via Embedding Similarity
Efficient Prompt Caching via Embedding Similarity
Hanlin Zhu
Banghua Zhu
Jiantao Jiao
RALM
86
9
0
02 Feb 2024
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
Mohammadreza Pourreza
Davood Rafiei
75
30
0
02 Feb 2024
Towards Efficient Exact Optimization of Language Model Alignment
Towards Efficient Exact Optimization of Language Model Alignment
Haozhe Ji
Cheng Lu
Yilin Niu
Pei Ke
Hongning Wang
Jun Zhu
Jie Tang
Minlie Huang
97
20
0
01 Feb 2024
Transforming and Combining Rewards for Aligning Large Language Models
Transforming and Combining Rewards for Aligning Large Language Models
Zihao Wang
Chirag Nagpal
Jonathan Berant
Jacob Eisenstein
Alex DÁmour
Oluwasanmi Koyejo
Victor Veitch
89
16
0
01 Feb 2024
Robust Prompt Optimization for Defending Language Models Against
  Jailbreaking Attacks
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou
Bo Li
Haohan Wang
AAML
132
88
0
30 Jan 2024
Security and Privacy Challenges of Large Language Models: A Survey
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILMELM
130
145
0
30 Jan 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
84
33
0
29 Jan 2024
KAUCUS: Knowledge Augmented User Simulators for Training Language Model
  Assistants
KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants
Kaustubh D. Dhole
77
3
0
29 Jan 2024
Design Principles for Generative AI Applications
Design Principles for Generative AI Applications
Justin D. Weisz
Jessica He
Michael J. Muller
Gabriela Hoefer
Rachel Miles
Werner Geyer
AI4CE
92
143
0
25 Jan 2024
DsDm: Model-Aware Dataset Selection with Datamodels
DsDm: Model-Aware Dataset Selection with Datamodels
Logan Engstrom
Axel Feldmann
Aleksander Madry
OODD
110
61
0
23 Jan 2024
Previous
123...141516...242526
Next