ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.08593
  4. Cited By
Fine-Tuning Language Models from Human Preferences
v1v2 (latest)

Fine-Tuning Language Models from Human Preferences

18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
    ALM
ArXiv (abs)PDFHTML

Papers citing "Fine-Tuning Language Models from Human Preferences"

50 / 1,265 papers shown
Title
Efficient Knowledge Infusion via KG-LLM Alignment
Efficient Knowledge Infusion via KG-LLM Alignment
Zhouyu Jiang
Ling Zhong
Mengshu Sun
Jun Xu
Rui Sun
Hui Cai
Shuhan Luo
Qing Cui
65
10
0
06 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
128
10
0
05 Jun 2024
LLM-based Rewriting of Inappropriate Argumentation using Reinforcement
  Learning from Machine Feedback
LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback
Timon Ziegenbein
Gabriella Skitalinskaya
Alireza Bayat Makou
Henning Wachsmuth
LLMAGKELM
98
8
0
05 Jun 2024
Scaling Laws for Reward Model Overoptimization in Direct Alignment
  Algorithms
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Rafael Rafailov
Yaswanth Chittepu
Ryan Park
Harshit S. Sikchi
Joey Hejna
Bradley Knox
Chelsea Finn
S. Niekum
127
69
0
05 Jun 2024
HYDRA: Model Factorization Framework for Black-Box LLM Personalization
HYDRA: Model Factorization Framework for Black-Box LLM Personalization
Yuchen Zhuang
Haotian Sun
Yue Yu
Rushi Qiang
Qifan Wang
Chao Zhang
Bo Dai
AAML
126
26
0
05 Jun 2024
Exploring Mathematical Extrapolation of Large Language Models with
  Synthetic Data
Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data
Haolong Li
Yu Ma
Yinqi Zhang
Chen Ye
Jie Chen
ReLMLRM
62
4
0
04 Jun 2024
The Life Cycle of Large Language Models: A Review of Biases in Education
The Life Cycle of Large Language Models: A Review of Biases in Education
Jinsook Lee
Yann Hicke
Renzhe Yu
Christopher A. Brooks
René F. Kizilcec
AI4Ed
99
2
0
03 Jun 2024
BoNBoN Alignment for Large Language Models and the Sweetness of
  Best-of-n Sampling
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
Lin Gui
Cristina Garbacea
Victor Veitch
BDLLM&MA
112
49
0
02 Jun 2024
Harnessing Business and Media Insights with Large Language Models
Harnessing Business and Media Insights with Large Language Models
Yujia Bao
Ankit Parag Shah
Neeru Narang
Jonathan Rivers
Rajeev Maksey
...
Gyuhak Kim
Dengpan Yin
Don Hejna
Mo Nomeli
Wei Wei
AIFin
82
5
0
02 Jun 2024
Aligning Language Models with Demonstrated Feedback
Aligning Language Models with Demonstrated Feedback
Omar Shaikh
Michelle S. Lam
Joey Hejna
Yijia Shao
Michael S. Bernstein
Michael S. Bernstein
Diyi Yang
ALM
105
26
0
02 Jun 2024
A Robot Walks into a Bar: Can Language Models Serve as Creativity
  Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with
  Comedians
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
Piotr Wojciech Mirowski
Juliette Love
K. Mathewson
Shakir Mohamed
80
23
0
31 May 2024
Improving Reward Models with Synthetic Critiques
Improving Reward Models with Synthetic Critiques
Zihuiwen Ye
Fraser Greenlee-Scott
Max Bartolo
Phil Blunsom
Jon Ander Campos
Matthias Gallé
ALMSyDaLRM
103
24
0
31 May 2024
Self-Augmented Preference Optimization: Off-Policy Paradigms for
  Language Model Alignment
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment
Yueqin Yin
Zhendong Wang
Yujia Xie
Weizhu Chen
Mingyuan Zhou
93
4
0
31 May 2024
Transfer Q Star: Principled Decoding for LLM Alignment
Transfer Q Star: Principled Decoding for LLM Alignment
Souradip Chakraborty
Soumya Suvra Ghosal
Ming Yin
Dinesh Manocha
Mengdi Wang
Amrit Singh Bedi
Furong Huang
120
33
0
30 May 2024
Group Robust Preference Optimization in Reward-free RLHF
Group Robust Preference Optimization in Reward-free RLHF
Shyam Sundhar Ramesh
Yifan Hu
Iason Chaimalas
Viraj Mehta
Pier Giuseppe Sessa
Haitham Bou-Ammar
Ilija Bogunovic
89
39
0
30 May 2024
TS-Align: A Teacher-Student Collaborative Framework for Scalable
  Iterative Finetuning of Large Language Models
TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models
Chen Zhang
Chengguang Tang
Dading Chong
Ke Shi
Guohua Tang
Feng Jiang
Haizhou Li
73
4
0
30 May 2024
Preference Alignment with Flow Matching
Preference Alignment with Flow Matching
Minu Kim
Yongsik Lee
Sehyeok Kang
Jihwan Oh
Song Chong
Seyoung Yun
91
2
0
30 May 2024
Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural
  Language Understanding
Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding
Kuo Liao
Shuang Li
Meng Zhao
Liqun Liu
Mengge Xue
Zhenyu Hu
Honglin Han
Chengguo Yin
86
1
0
30 May 2024
One-Shot Safety Alignment for Large Language Models via Optimal
  Dualization
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Xinmeng Huang
Shuo Li
Yan Sun
Osbert Bastani
Hamed Hassani
Dongsheng Ding
89
10
0
29 May 2024
Weak-to-Strong Search: Align Large Language Models via Searching over
  Small Language Models
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Zhanhui Zhou
Zhixuan Liu
Jie Liu
Zhichen Dong
Chao Yang
Yu Qiao
ALM
109
27
0
29 May 2024
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
121
29
0
29 May 2024
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model
  with Mixed Reward Feedback
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
Jiachen Li
Weixi Feng
Tsu-Jui Fu
Xinyi Wang
Sugato Basu
Wenhu Chen
William Y. Wang
VGen
89
34
0
29 May 2024
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Jian-Qiao Zhu
Haijiang Yan
Thomas Griffiths
150
3
0
29 May 2024
The Evolution of Multimodal Model Architectures
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Aman Chadha
Eugenio Culurciello
109
18
0
28 May 2024
Getting More Juice Out of the SFT Data: Reward Learning from Human
  Demonstration Improves SFT for LLM Alignment
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Jiaxiang Li
Siliang Zeng
Hoi-To Wai
Chenliang Li
Alfredo García
Mingyi Hong
133
18
0
28 May 2024
Personalized Steering of Large Language Models: Versatile Steering
  Vectors Through Bi-directional Preference Optimization
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Yuanpu Cao
Tianrong Zhang
Bochuan Cao
Ziyi Yin
Lu Lin
Fenglong Ma
Jinghui Chen
LLMSV
92
33
0
28 May 2024
Linguistic Collapse: Neural Collapse in (Large) Language Models
Linguistic Collapse: Neural Collapse in (Large) Language Models
Robert Wu
Vardan Papyan
97
16
0
28 May 2024
Prompt Optimization with Human Feedback
Prompt Optimization with Human Feedback
Xiaoqiang Lin
Zhongxiang Dai
Arun Verma
See-Kiong Ng
Patrick Jaillet
K. H. Low
AAML
97
12
0
27 May 2024
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Chia-Yi Hsu
Yu-Lin Tsai
Chih-Hsun Lin
Pin-Yu Chen
Chia-Mu Yu
Chun-ying Huang
143
56
0
27 May 2024
Triple Preference Optimization: Achieving Better Alignment with Less
  Data in a Single Step Optimization
Triple Preference Optimization: Achieving Better Alignment with Less Data in a Single Step Optimization
Amir Saeidi
Shivanshu Verma
Aswin Rrv
Chitta Baral
85
5
0
26 May 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is
  Implicitly an Adversarial Regularizer
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
147
62
0
26 May 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
139
15
0
26 May 2024
Learning Generalizable Human Motion Generator with Reinforcement
  Learning
Learning Generalizable Human Motion Generator with Reinforcement Learning
Yunyao Mao
Xiaoyang Liu
Wen-gang Zhou
Zhenbo Lu
Houqiang Li
72
4
0
24 May 2024
Embedding-Aligned Language Models
Embedding-Aligned Language Models
Guy Tennenholtz
Yinlam Chow
Chih-Wei Hsu
Lior Shani
Ethan Liang
Craig Boutilier
AIFin
138
2
0
24 May 2024
Bayesian WeakS-to-Strong from Text Classification to Generation
Bayesian WeakS-to-Strong from Text Classification to Generation
Ziyun Cui
Ziyang Zhang
Wen Wu
Wen Wu
Chao Zhang
127
3
0
24 May 2024
Direct Preference Optimization With Unobserved Preference Heterogeneity
Direct Preference Optimization With Unobserved Preference Heterogeneity
Keertana Chidambaram
Karthik Vinay Seetharaman
Vasilis Syrgkanis
92
10
0
23 May 2024
Axioms for AI Alignment from Human Feedback
Axioms for AI Alignment from Human Feedback
Luise Ge
Daniel Halpern
Evi Micha
Ariel D. Procaccia
Itai Shapira
Yevgeniy Vorobeychik
Junlin Wu
77
24
0
23 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
185
492
0
23 May 2024
Multi-turn Reinforcement Learning from Preference Human Feedback
Multi-turn Reinforcement Learning from Preference Human Feedback
Lior Shani
Aviv Rosenberg
Asaf B. Cassel
Oran Lang
Daniele Calandriello
...
Bilal Piot
Idan Szpektor
Avinatan Hassidim
Yossi Matias
Rémi Munos
97
34
0
23 May 2024
Calibrated Self-Rewarding Vision Language Models
Calibrated Self-Rewarding Vision Language Models
Yiyang Zhou
Zhiyuan Fan
Dongjie Cheng
Sihan Yang
Zhaorun Chen
Chenhang Cui
Xiyao Wang
Yun Li
Linjun Zhang
Huaxiu Yao
VLM
141
34
0
23 May 2024
Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization
Efficient Universal Goal Hijacking with Semantics-guided Prompt Organization
Yihao Huang
Chong Wang
Xiaojun Jia
Qing Guo
Felix Juefei Xu
Jian Zhang
G. Pu
Yang Liu
109
9
0
23 May 2024
Annotation-Efficient Preference Optimization for Language Model
  Alignment
Annotation-Efficient Preference Optimization for Language Model Alignment
Yuu Jinnai
Ukyo Honda
65
0
0
22 May 2024
More Distinctively Black and Feminine Faces Lead to Increased
  Stereotyping in Vision-Language Models
More Distinctively Black and Feminine Faces Lead to Increased Stereotyping in Vision-Language Models
Messi H.J. Lee
Jacob M. Montgomery
Calvin K. Lai
VLM
71
0
0
22 May 2024
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Ziqiao Ma
Zekun Wang
Joyce Chai
146
4
0
22 May 2024
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Rheeya Uppaal
Apratim De
Yiting He
Yiquao Zhong
Junjie Hu
158
7
0
22 May 2024
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
N. Sebe
Mubarak Shah
EGVM
203
7
0
22 May 2024
SPO: Multi-Dimensional Preference Sequential Alignment With Implicit
  Reward Modeling
SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
Xingzhou Lou
Junge Zhang
Jian Xie
Lifeng Liu
Dong Yan
Kaiqi Huang
96
13
0
21 May 2024
A Unified Linear Programming Framework for Offline Reward Learning from
  Human Demonstrations and Feedback
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback
Kihyun Kim
Jiawei Zhang
Asuman Ozdaglar
P. Parrilo
OffRL
144
2
0
20 May 2024
A Constraint-Enforcing Reward for Adversarial Attacks on Text
  Classifiers
A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers
Tom Roth
Inigo Jauregi Unanue
A. Abuadbba
Massimo Piccardi
AAMLSILM
74
1
0
20 May 2024
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single
  Process
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
Ermo Hua
Biqing Qi
Kaiyan Zhang
Yue Yu
Ning Ding
Xingtai Lv
Kai Tian
Bowen Zhou
78
4
0
20 May 2024
Previous
123...101112...242526
Next