ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.08593
  4. Cited By
Fine-Tuning Language Models from Human Preferences
v1v2 (latest)

Fine-Tuning Language Models from Human Preferences

18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
    ALM
ArXiv (abs)PDFHTML

Papers citing "Fine-Tuning Language Models from Human Preferences"

50 / 1,265 papers shown
Title
WARM: On the Benefits of Weight Averaged Reward Models
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
179
104
0
22 Jan 2024
Enhancing Recommendation Diversity by Re-ranking with Large Language
  Models
Enhancing Recommendation Diversity by Re-ranking with Large Language Models
Diego Carraro
Derek Bridge
LRMALM
131
18
0
21 Jan 2024
Knowledge Verification to Nip Hallucination in the Bud
Knowledge Verification to Nip Hallucination in the Bud
Fanqi Wan
Xinting Huang
Leyang Cui
Xiaojun Quan
Wei Bi
Shuming Shi
HILM
61
4
0
19 Jan 2024
PHOENIX: Open-Source Language Adaption for Direct Preference
  Optimization
PHOENIX: Open-Source Language Adaption for Direct Preference Optimization
Matthias Uhlig
Sigurd Schacht
Sudarshan Kamath Barkur
ALM
52
1
0
19 Jan 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLMSyDaALMLRM
403
340
0
18 Jan 2024
Aligning Large Language Models with Counterfactual DPO
Aligning Large Language Models with Counterfactual DPO
Bradley Butcher
ALM
57
1
0
17 Jan 2024
ReFT: Reasoning with Reinforced Fine-Tuning
ReFT: Reasoning with Reinforced Fine-Tuning
Trung Quoc Luong
Xinbo Zhang
Zhanming Jie
Peng Sun
Xiaoran Jin
Hang Li
OffRLLRMReLM
113
132
0
17 Jan 2024
Contrastive Preference Optimization: Pushing the Boundaries of LLM
  Performance in Machine Translation
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Haoran Xu
Amr Sharaf
Yunmo Chen
Weiting Tan
Lingfeng Shen
Benjamin Van Durme
Kenton W. Murray
Young Jin Kim
ALM
122
266
0
16 Jan 2024
Safe Reinforcement Learning with Free-form Natural Language Constraints
  and Pre-Trained Language Models
Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models
Xingzhou Lou
Junge Zhang
Ziyan Wang
Kaiqi Huang
Yali Du
73
3
0
15 Jan 2024
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language
  Model Critique in Text Generation
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
Meng Cao
Lei Shu
Lei Yu
Yun Zhu
Nevan Wichers
Yinxiao Liu
Lei Meng
OffRLALM
53
7
0
14 Jan 2024
Reinforcement Learning from LLM Feedback to Counteract Goal
  Misgeneralization
Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization
Houda Nait El Barj
Théophile Sautory
106
2
0
14 Jan 2024
Improving Large Language Models via Fine-grained Reinforcement Learning
  with Minimum Editing Constraint
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Junchen Wan
Fuzheng Zhang
Di Zhang
Ji-Rong Wen
KELM
106
35
0
11 Jan 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Bing Wang
Rui Zheng
Luyao Chen
Yan Liu
Shihan Dou
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yuanyuan Jiang
ALM
113
110
0
11 Jan 2024
Concept Alignment
Concept Alignment
Sunayana Rane
Polyphony J. Bruna
Ilia Sucholutsky
Christopher Kello
Thomas Griffiths
CVBM
69
8
0
09 Jan 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy
Christoph Dann
Rahul Kidambi
Zhiwei Steven Wu
Alekh Agarwal
OffRL
132
112
0
08 Jan 2024
Empirical Analysis of Efficient Fine-Tuning Methods for Large
  Pre-Trained Language Models
Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models
Nigel Doering
Cyril Gorlla
Trevor Tuttle
Adhvaith Vijay
38
1
0
08 Jan 2024
MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance
MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance
Renjie Pi
Tianyang Han
Jianshu Zhang
Yueqi Xie
Boyao Wang
Qing Lian
Hanze Dong
Jipeng Zhang
Tong Zhang
AAML
109
71
0
05 Jan 2024
LLaMA Pro: Progressive LLaMA with Block Expansion
LLaMA Pro: Progressive LLaMA with Block Expansion
Chengyue Wu
Yukang Gan
Yixiao Ge
Zeyu Lu
Jiahao Wang
Ye Feng
Ying Shan
Ping Luo
CLL
90
72
0
04 Jan 2024
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
  Models
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Zixiang Chen
Yihe Deng
Huizhuo Yuan
Kaixuan Ji
Quanquan Gu
SyDa
143
327
0
02 Jan 2024
A Reliable Knowledge Processing Framework for Combustion Science using
  Foundation Models
A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models
Vansh Sharma
Venkat Raman
52
7
0
31 Dec 2023
keqing: knowledge-based question answering is a nature chain-of-thought
  mentor of LLM
keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM
Chaojie Wang
Yishi Xu
Zhong Peng
Chenxi Zhang
Bo Chen
Xinrun Wang
Lei Feng
Bo An
135
19
0
31 Dec 2023
Uncertainty-Penalized Reinforcement Learning from Human Feedback with
  Diverse Reward LoRA Ensembles
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Yuanzhao Zhai
Han Zhang
Yu Lei
Yue Yu
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
AI4CE
145
35
0
30 Dec 2023
AI Content Self-Detection for Transformer-based Large Language Models
AI Content Self-Detection for Transformer-based Large Language Models
Antonio Junior Alves Caiado
Michael Hahsler
DeLMO
34
1
0
28 Dec 2023
Large Language Models for Conducting Advanced Text Analytics Information
  Systems Research
Large Language Models for Conducting Advanced Text Analytics Information Systems Research
Benjamin Ampel
Chi-Heng Yang
Junjie Hu
Hsinchun Chen
118
8
0
27 Dec 2023
Preference as Reward, Maximum Preference Optimization with Importance
  Sampling
Preference as Reward, Maximum Preference Optimization with Importance Sampling
Zaifan Jiang
Xing Huang
Chao Wei
105
2
0
27 Dec 2023
Can ChatGPT Read Who You Are?
Can ChatGPT Read Who You Are?
Erik Derner
D. Kučera
Nuria Oliver
Jan Zahálka
76
7
0
26 Dec 2023
AutoTask: Executing Arbitrary Voice Commands by Exploring and Learning
  from Mobile GUI
AutoTask: Executing Arbitrary Voice Commands by Exploring and Learning from Mobile GUI
Lihang Pan
Bowen Wang
Chun Yu
Yuxuan Chen
Xiangyu Zhang
Yuanchun Shi
81
3
0
26 Dec 2023
Aligning Large Language Models with Human Preferences through
  Representation Engineering
Aligning Large Language Models with Human Preferences through Representation Engineering
Tianlong Li
Xiaohua Wang
Muling Wu
Changze Lv
Changze Lv
Zixuan Ling
Jianhao Zhu
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
69
41
0
26 Dec 2023
Advancing Abductive Reasoning in Knowledge Graphs through Complex
  Logical Hypothesis Generation
Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation
Jiaxin Bai
Yicheng Wang
Tianshi Zheng
Yue Guo
Xin Liu
Yangqiu Song
128
7
0
25 Dec 2023
Prompt Valuation Based on Shapley Values
Prompt Valuation Based on Shapley Values
Hanxi Liu
Xiaokai Mao
Haocheng Xia
Jian Lou
Jinfei Liu
66
7
0
24 Dec 2023
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Dahyun Kim
Chanjun Park
Sanghoon Kim
Wonsung Lee
Wonho Song
...
Hyunbyung Park
Gyoungjin Gim
Mikyoung Cha
Hwalsuk Lee
Sunghun Kim
ALMELM
119
150
0
23 Dec 2023
Reasons to Reject? Aligning Language Models with Judgments
Reasons to Reject? Aligning Language Models with Judgments
Weiwen Xu
Deng Cai
Zhisong Zhang
Wai Lam
Shuming Shi
ALM
91
15
0
22 Dec 2023
A Unified Industrial Large Knowledge Model Framework in Smart
  Manufacturing
A Unified Industrial Large Knowledge Model Framework in Smart Manufacturing
Jay Lee
Hanqi Su
64
25
0
22 Dec 2023
OpenRL: A Unified Reinforcement Learning Framework
OpenRL: A Unified Reinforcement Learning Framework
Shiyu Huang
Wentse Chen
Yiwen Sun
Fuqing Bie
Weijuan Tu
81
3
0
20 Dec 2023
Learning and Forgetting Unsafe Examples in Large Language Models
Learning and Forgetting Unsafe Examples in Large Language Models
Jiachen Zhao
Zhun Deng
David Madras
James Zou
Mengye Ren
MUKELMCLL
152
18
0
20 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
135
204
0
18 Dec 2023
Distributional Preference Learning: Understanding and Accounting for
  Hidden Context in RLHF
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan
Cassidy Laidlaw
Dylan Hadfield-Menell
118
72
0
13 Dec 2023
Steering Llama 2 via Contrastive Activation Addition
Steering Llama 2 via Contrastive Activation Addition
Nina Rimsky
Nick Gabrieli
Julian Schulz
Meg Tong
Evan Hubinger
Alexander Matt Turner
LLMSV
61
226
0
09 Dec 2023
Is Feedback All You Need? Leveraging Natural Language Feedback in
  Goal-Conditioned Reinforcement Learning
Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning
Sabrina McCallum
Max Taylor-Davies
Stefano V. Albrecht
Alessandro Suglia
60
1
0
07 Dec 2023
A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
  Generation
A Block Metropolis-Hastings Sampler for Controllable Energy-based Text Generation
Jarad Forristal
Niloofar Mireshghallah
Greg Durrett
Taylor Berg-Kirkpatrick
153
6
0
07 Dec 2023
CLadder: Assessing Causal Reasoning in Language Models
CLadder: Assessing Causal Reasoning in Language Models
Zhijing Jin
Yuen Chen
Felix Leeb
Luigi Gresele
Ojasv Kamal
...
Kevin Blin
Fernando Gonzalez Adauto
Max Kleiman-Weiner
Mrinmaya Sachan
Bernhard Schölkopf
ReLMELMLRM
112
79
0
07 Dec 2023
A Study on the Calibration of In-context Learning
A Study on the Calibration of In-context Learning
Hanlin Zhang
Yi-Fan Zhang
Yaodong Yu
Dhruv Madeka
Dean Phillips Foster
Eric Xing
Hima Lakkaraju
Sham Kakade
109
16
0
07 Dec 2023
Language Model Alignment with Elastic Reset
Language Model Alignment with Elastic Reset
Michael Noukhovitch
Samuel Lavoie
Florian Strub
Aaron Courville
KELM
161
27
0
06 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLMVLM
123
6
0
06 Dec 2023
ULMA: Unified Language Model Alignment with Human Demonstration and
  Point-wise Preference
ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference
Tianchi Cai
Xierui Song
Jiyan Jiang
Fei Teng
Jinjie Gu
Guannan Zhang
ALM
84
5
0
05 Dec 2023
Efficient Online Data Mixing For Language Model Pre-Training
Efficient Online Data Mixing For Language Model Pre-Training
Alon Albalak
Liangming Pan
Colin Raffel
Wenjie Wang
101
46
0
05 Dec 2023
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
102
272
0
04 Dec 2023
Honesty Is the Best Policy: Defining and Mitigating AI Deception
Honesty Is the Best Policy: Defining and Mitigating AI Deception
Francis Rhys Ward
Francesco Belardinelli
Francesca Toni
Tom Everitt
182
31
0
03 Dec 2023
Distributed Reinforcement Learning for Molecular Design: Antioxidant
  case
Distributed Reinforcement Learning for Molecular Design: Antioxidant case
Huanyi Qin
D. Akhiyarov
Sophie Loehle
Kenneth Chiu
Mauricio Araya-Polo
65
0
0
03 Dec 2023
Axiomatic Preference Modeling for Longform Question Answering
Axiomatic Preference Modeling for Longform Question Answering
Corby Rosset
Guoqing Zheng
Victor C. Dibia
Ahmed Hassan Awadallah
Paul Bennett
SyDa
59
4
0
02 Dec 2023
Previous
123...151617...242526
Next