ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.08593
  4. Cited By
Fine-Tuning Language Models from Human Preferences
v1v2 (latest)

Fine-Tuning Language Models from Human Preferences

18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
    ALM
ArXiv (abs)PDFHTML

Papers citing "Fine-Tuning Language Models from Human Preferences"

50 / 1,265 papers shown
Title
The pitfalls of next-token prediction
The pitfalls of next-token prediction
Gregor Bachmann
Vaishnavh Nagarajan
115
81
0
11 Mar 2024
Evolving Knowledge Distillation with Large Language Models and Active
  Learning
Evolving Knowledge Distillation with Large Language Models and Active Learning
Chengyuan Liu
Yangyang Kang
Fubang Zhao
Kun Kuang
Zhuoren Jiang
Changlong Sun
Leilei Gan
49
6
0
11 Mar 2024
Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated
  Text
Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
DeLMO
103
11
0
09 Mar 2024
Bayesian Preference Elicitation with Language Models
Bayesian Preference Elicitation with Language Models
Kunal Handa
Yarin Gal
Ellie Pavlick
Noah D. Goodman
Jacob Andreas
Alex Tamkin
Belinda Z. Li
77
16
0
08 Mar 2024
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback
Huiying Zhong
Zhun Deng
Weijie J. Su
Zhiwei Steven Wu
Linjun Zhang
79
18
0
08 Mar 2024
A Survey on Human-AI Teaming with Large Pre-Trained Models
A Survey on Human-AI Teaming with Large Pre-Trained Models
Vanshika Vats
Marzia Binta Nizam
Minghao Liu
Ziyuan Wang
Richard Ho
...
Celeste Shen
Rachel Shen
Nafisa Hussain
Kesav Ravichandran
James Davis
LM&MA
124
9
0
07 Mar 2024
Teaching Large Language Models to Reason with Reinforcement Learning
Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLMLRM
111
94
0
07 Mar 2024
Enhancing Court View Generation with Knowledge Injection and Guidance
Enhancing Court View Generation with Knowledge Injection and Guidance
Ang Li
Yiquan Wu
Yifei Liu
Leilei Gan
Ming Cai
Kun Kuang
AILaw
58
3
0
07 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
129
195
0
05 Mar 2024
DACO: Towards Application-Driven and Comprehensive Data Analysis via
  Code Generation
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
Xueqing Wu
Rui Zheng
Jingzhen Sha
Te-Lin Wu
Hanyu Zhou
Mohan Tang
Kai-Wei Chang
Nanyun Peng
Haoran Huang
106
2
0
04 Mar 2024
Enhancing LLM Safety via Constrained Direct Preference Optimization
Enhancing LLM Safety via Constrained Direct Preference Optimization
Zixuan Liu
Xiaolin Sun
Zizhan Zheng
91
29
0
04 Mar 2024
Vision-Language Models for Medical Report Generation and Visual Question
  Answering: A Review
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
Iryna Hartsock
Ghulam Rasool
102
82
0
04 Mar 2024
Improving the Validity of Automatically Generated Feedback via
  Reinforcement Learning
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning
Alexander Scarlatos
Digory Smith
Simon Woodhead
Andrew Lan
OffRL
80
12
0
02 Mar 2024
DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
Shanghaoran Quan
MoEOffRL
78
10
0
02 Mar 2024
LLaMoCo: Instruction Tuning of Large Language Models for Optimization
  Code Generation
LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation
Zeyuan Ma
Hongshu Guo
Jiacheng Chen
Guojun Peng
Zhiguang Cao
Yining Ma
Yue-Jiao Gong
SyDaALM
91
22
0
02 Mar 2024
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Vithursan Thangarasa
Mahmoud Salem
Shreyas Saxena
Kevin Leong
Joel Hestness
Sean Lie
MedIm
81
1
0
01 Mar 2024
Improving Socratic Question Generation using Data Augmentation and
  Preference Optimization
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar
Andrew Lan
113
9
0
01 Mar 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou
Andrea Zanette
Jiayi Pan
Sergey Levine
Aviral Kumar
146
79
0
29 Feb 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
Frank Breitinger
Mark Scanlon
150
10
0
29 Feb 2024
Arithmetic Control of LLMs for Diverse User Preferences: Directional
  Preference Alignment with Multi-Objective Rewards
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Haoxiang Wang
Yong Lin
Wei Xiong
Rui Yang
Shizhe Diao
Shuang Qiu
Han Zhao
Tong Zhang
133
88
0
28 Feb 2024
Making Them Ask and Answer: Jailbreaking Large Language Models in Few
  Queries via Disguise and Reconstruction
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
Tong Liu
Yingjie Zhang
Zhe Zhao
Yinpeng Dong
Guozhu Meng
Kai Chen
AAML
111
60
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
288
22
0
28 Feb 2024
From Large Language Models and Optimization to Decision Optimization
  CoPilot: A Research Manifesto
From Large Language Models and Optimization to Decision Optimization CoPilot: A Research Manifesto
Segev Wasserkrug
Léonard Boussioux
D. Hertog
F. Mirzazadeh
Ilker Birbil
Jannis Kurtz
Donato Maragno
LLMAG
100
3
0
26 Feb 2024
Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A
  Case-Study in E-Commerce Opinion Summarization
Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization
Swaroop Nath
Tejpalsingh Siledar
Sankara Sri Raghava Ravindra Muddu
Rupasai Rangaraju
H. Khadilkar
...
Suman Banerjee
Amey Patil
Sudhanshu Singh
M. Chelliah
Nikesh Garera
77
0
0
23 Feb 2024
CloChat: Understanding How People Customize, Interact, and Experience
  Personas in Large Language Models
CloChat: Understanding How People Customize, Interact, and Experience Personas in Large Language Models
Juhye Ha
Hyeon Jeon
DaEun Han
Jinwook Seo
Changhoon Oh
75
38
0
23 Feb 2024
Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Chen Jia
83
2
0
22 Feb 2024
Chain-of-Thought Unfaithfulness as Disguised Accuracy
Chain-of-Thought Unfaithfulness as Disguised Accuracy
Oliver Bentham
Nathan Stringham
Ana Marasović
LRMHILM
90
16
0
22 Feb 2024
OmniPred: Language Models as Universal Regressors
OmniPred: Language Models as Universal Regressors
Xingyou Song
Oscar Li
Chansoo Lee
Bangding Yang
Daiyi Peng
Sagi Perel
Yutian Chen
115
16
0
22 Feb 2024
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in
  Clinical Summarization
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Prakamya Mishra
Zonghai Yao
Parth Vashisht
Feiyun Ouyang
Beining Wang
Vidhi Mody
Hong-ye Yu
SyDaMedIm
81
5
0
21 Feb 2024
Kuaiji: the First Chinese Accounting Large Language Model
Kuaiji: the First Chinese Accounting Large Language Model
Jiayuan Luo
Songhua Yang
Xiaoling Qiu
Panyu Chen
Yufei Nai
Wenxuan Zeng
Wentao Zhang
Xinke Jiang
RALMALM
46
1
0
21 Feb 2024
Privacy-Preserving Instructions for Aligning Large Language Models
Privacy-Preserving Instructions for Aligning Large Language Models
Da Yu
Peter Kairouz
Sewoong Oh
Zheng Xu
118
25
0
21 Feb 2024
Large Language Models for Data Annotation: A Survey
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Wenlin Yao
Lu Cheng
Huan Liu
SyDa
134
80
0
21 Feb 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
106
155
0
20 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Dinesh Manocha
KELMVLM
173
135
0
20 Feb 2024
Mode Estimation with Partial Feedback
Mode Estimation with Partial Feedback
Charles Arnal
Vivien A. Cabannes
Vianney Perchet
95
0
0
20 Feb 2024
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box
  Identification
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
Martin Gubri
Dennis Ulmer
Hwaran Lee
Sangdoo Yun
Seong Joon Oh
SILM
482
6
1
20 Feb 2024
PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of
  LLMs
PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs
An Liu
Zonghan Yang
Zhenhe Zhang
Qingyuan Hu
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
ALM
60
2
0
20 Feb 2024
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Runlong Zhou
Simon S. Du
Beibin Li
OffRL
89
4
0
20 Feb 2024
Generative AI Security: Challenges and Countermeasures
Generative AI Security: Challenges and Countermeasures
Banghua Zhu
Norman Mu
Jiantao Jiao
David Wagner
AAMLSILM
107
10
0
20 Feb 2024
Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems
Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems
Zhaowei Zhang
Fengshuo Bai
Mingzhi Wang
Haoyang Ye
Chengdong Ma
Yaodong Yang
77
6
0
20 Feb 2024
A Critical Evaluation of AI Feedback for Aligning Large Language Models
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Archit Sharma
Sedrick Scott Keh
Eric Mitchell
Chelsea Finn
Kushal Arora
Thomas Kollar
ALMLLMAG
98
27
0
19 Feb 2024
BIDER: Bridging Knowledge Inconsistency for Efficient
  Retrieval-Augmented LLMs via Key Supporting Evidence
BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence
Jiajie Jin
Yutao Zhu
Yujia Zhou
Zhicheng Dou
RALM
104
23
0
19 Feb 2024
Dissecting Human and LLM Preferences
Dissecting Human and LLM Preferences
Junlong Li
Fan Zhou
Shichao Sun
Yikai Zhang
Hai Zhao
Pengfei Liu
ALM
89
6
0
17 Feb 2024
Aligning Large Language Models by On-Policy Self-Judgment
Aligning Large Language Models by On-Policy Self-Judgment
Sangkyu Lee
Sungdong Kim
Ashkan Yousefpour
Minjoon Seo
Kang Min Yoo
Youngjae Yu
OSLM
75
12
0
17 Feb 2024
RLVF: Learning from Verbal Feedback without Overgeneralization
RLVF: Learning from Verbal Feedback without Overgeneralization
Moritz Stephan
Alexander Khazatsky
Eric Mitchell
Annie S. Chen
Sheryl Hsu
Archit Sharma
Chelsea Finn
89
12
0
16 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
208
113
0
16 Feb 2024
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate
  Controllable Controversial Statements
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
Ming Li
Jiuhai Chen
Lichang Chen
Dinesh Manocha
147
21
0
16 Feb 2024
Direct Preference Optimization with an Offset
Direct Preference Optimization with an Offset
Afra Amini
Tim Vieira
Ryan Cotterell
131
67
0
16 Feb 2024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with
  Dynamic Preference Adjustment
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
Rui Yang
Xiaoman Pan
Feng Luo
Shuang Qiu
Han Zhong
Dong Yu
Jianshu Chen
220
83
0
15 Feb 2024
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization
  Method for Alignment of Large Language Models
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
Saeed Khaki
JinJin Li
Lan Ma
Liu Yang
Prathap Ramachandra
81
24
0
15 Feb 2024
Previous
123...131415...242526
Next