ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.08593
  4. Cited By
Fine-Tuning Language Models from Human Preferences
v1v2 (latest)

Fine-Tuning Language Models from Human Preferences

18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
    ALM
ArXiv (abs)PDFHTML

Papers citing "Fine-Tuning Language Models from Human Preferences"

50 / 1,265 papers shown
Title
In-Context Learning Learns Label Relationships but Is Not Conventional
  Learning
In-Context Learning Learns Label Relationships but Is Not Conventional Learning
Jannik Kossen
Y. Gal
Tom Rainforth
132
36
0
23 Jul 2023
Kernelized Offline Contextual Dueling Bandits
Kernelized Offline Contextual Dueling Bandits
Viraj Mehta
Ojash Neopane
Vikramjeet Das
Sen Lin
J. Schneider
Willie Neiswanger
OffRL
78
4
0
21 Jul 2023
LLM Censorship: A Machine Learning Challenge or a Computer Security
  Problem?
LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?
David Glukhov
Ilia Shumailov
Y. Gal
Nicolas Papernot
Vardan Papyan
AAMLELM
96
58
0
20 Jul 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple
  Choice Capabilities in Chinchilla
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
103
115
0
18 Jul 2023
A mixed policy to improve performance of language models on math
  problems
A mixed policy to improve performance of language models on math problems
Gang Chen
ReLMMoELRM
35
0
0
17 Jul 2023
On the application of Large Language Models for language teaching and
  assessment technology
On the application of Large Language Models for language teaching and assessment technology
Andrew Caines
Luca Benedetto
Shiva Taslimipoor
Christopher Davis
Yuan Gao
...
Marek Rei
H. Yannakoudakis
Andrew Mullooly
D. Nicholls
P. Buttery
ELM
70
48
0
17 Jul 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLMLRM
80
193
0
17 Jul 2023
Effective Prompt Extraction from Language Models
Effective Prompt Extraction from Language Models
Yiming Zhang
Nicholas Carlini
Daphne Ippolito
MIACVSILM
105
43
0
13 Jul 2023
Leveraging Contextual Counterfactuals Toward Belief Calibration
Leveraging Contextual Counterfactuals Toward Belief Calibration
Qiuyi Zhang
Zhang
Michael S. Lee
Sherol Chen
65
1
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
261
624
0
12 Jul 2023
Secrets of RLHF in Large Language Models Part I: PPO
Secrets of RLHF in Large Language Models Part I: PPO
Rui Zheng
Shihan Dou
Songyang Gao
Yuan Hua
Wei Shen
...
Hang Yan
Tao Gui
Qi Zhang
Xipeng Qiu
Xuanjing Huang
ALMOffRL
122
177
0
11 Jul 2023
Loss Dynamics of Temporal Difference Reinforcement Learning
Loss Dynamics of Temporal Difference Reinforcement Learning
Blake Bordelon
P. Masset
Henry Kuo
Cengiz Pehlevan
AI4CE
58
0
0
10 Jul 2023
Advancements in Scientific Controllable Text Generation Methods
Advancements in Scientific Controllable Text Generation Methods
Arnav Goel
Medha Hira
Avinash Anand
Siddhesh Bangar
R. Shah
76
7
0
08 Jul 2023
Opening up ChatGPT: Tracking openness, transparency, and accountability
  in instruction-tuned text generators
Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators
Andreas Liesenfeld
Alianda Lopez
Mark Dingemanse
ALM
87
92
0
08 Jul 2023
Improving Prototypical Visual Explanations with Reward Reweighing,
  Reselection, and Retraining
Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
Aaron J. Li
Robin Netzorg
Zhihan Cheng
Zhuoqin Zhang
Bin Yu
70
3
0
08 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
150
125
0
06 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
223
1,766
0
06 Jul 2023
Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
Taeho Yoon
Kibeom Myoung
Keon Lee
Jaewoong Cho
Albert No
Ernest K. Ryu
92
8
0
06 Jul 2023
Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts
Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts
Mounica Maddela
Megan Ung
Jing Xu
Andrea Madotto
H. Foran
Y-Lan Boureau
LRM
105
23
0
06 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
236
1,005
0
05 Jul 2023
Natural Language Generation and Understanding of Big Code for
  AI-Assisted Programming: A Review
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
M. Wong
Shangxin Guo
Ching Nam Hang
Siu-Wai Ho
C. Tan
97
87
0
04 Jul 2023
Causal Reinforcement Learning: A Survey
Causal Reinforcement Learning: A Survey
Zhi-Hong Deng
Jing Jiang
Guodong Long
Chen Zhang
CMLLRM
107
16
0
04 Jul 2023
Evaluating Shutdown Avoidance of Language Models in Textual Scenarios
Evaluating Shutdown Avoidance of Language Models in Textual Scenarios
Teun van der Weij
Simon Lermen
Leon Lang
LLMAG
67
4
0
03 Jul 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language
  Models
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Käser
Antoine Bosselut
142
4
0
01 Jul 2023
Personality Traits in Large Language Models
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MALLMAG
207
127
0
01 Jul 2023
Preference Ranking Optimization for Human Alignment
Preference Ranking Optimization for Human Alignment
Feifan Song
Yu Bowen
Minghao Li
Haiyang Yu
Fei Huang
Yongbin Li
Houfeng Wang
ALM
88
272
0
30 Jun 2023
Towards Measuring the Representation of Subjective Global Opinions in
  Language Models
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
147
245
0
28 Jun 2023
Is RLHF More Difficult than Standard RL?
Is RLHF More Difficult than Standard RL?
Yuanhao Wang
Qinghua Liu
Chi Jin
OffRL
112
67
0
25 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLMLRM
138
612
0
23 Jun 2023
Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities
Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities
Xudong Shen
H. Brown
Jiashu Tao
Martin Strobel
Yao Tong
Akshay Narayan
Harold Soh
Finale Doshi-Velez
98
3
0
22 Jun 2023
An Overview of Catastrophic AI Risks
An Overview of Catastrophic AI Risks
Dan Hendrycks
Mantas Mazeika
Thomas Woodside
SILM
82
186
0
21 Jun 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Christopher T. Small
Ivan Vendrov
Esin Durmus
Hadjar Homaei
Elizabeth Barry
Julien Cornebise
Ted Suzman
Deep Ganguli
Colin Megill
101
30
0
20 Jun 2023
Learning to Generate Better Than Your LLM
Learning to Generate Better Than Your LLM
Jonathan D. Chang
Kianté Brantley
Rajkumar Ramamurthy
Dipendra Kumar Misra
Wen Sun
72
49
0
20 Jun 2023
Aligning Synthetic Medical Images with Clinical Knowledge using Human
  Feedback
Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback
Shenghuan Sun
Gregory M. Goldgof
A. Butte
Ahmed Alaa
MedIm
70
14
0
16 Jun 2023
AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology
AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology
Haixing Dai
Yiwei Li
Zheng Liu
Lin Zhao
Zihao Wu
...
Quanzheng Li
Zhuo Chen
D. Zhang
Gengchen Mai
Tianming Liu
LM&MA
105
30
0
16 Jun 2023
Residual Q-Learning: Offline and Online Policy Customization without
  Value
Residual Q-Learning: Offline and Online Policy Customization without Value
Chenran Li
Chen Tang
Haruki Nishimura
Jean Mercat
Masayoshi Tomizuka
Wei Zhan
OffRL
99
7
0
15 Jun 2023
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Stephen Casper
Jason Lin
Joe Kwon
Gatlen Culp
Dylan Hadfield-Menell
AAML
60
99
0
15 Jun 2023
Domain-specific ChatBots for Science using Embeddings
Domain-specific ChatBots for Science using Embeddings
Kevin G. Yager
70
8
0
15 Jun 2023
Improving Reading Comprehension Question Generation with Data
  Augmentation and Overgenerate-and-rank
Improving Reading Comprehension Question Generation with Data Augmentation and Overgenerate-and-rank
Nischal Ashok Kumar
Nigel Fernandez
Zichao Wang
Andrew Lan
RALM
69
13
0
15 Jun 2023
Toward Grounded Commonsense Reasoning
Toward Grounded Commonsense Reasoning
Minae Kwon
Hengyuan Hu
Vivek Myers
Siddharth Karamcheti
Anca Dragan
Dorsa Sadigh
LM&RoReLMLRM
88
10
0
14 Jun 2023
Chart2Vec: A Universal Embedding of Context-Aware Visualizations
Chart2Vec: A Universal Embedding of Context-Aware Visualizations
Qing Chen
Ying Chen
Ruishi Zou
Wei Shuai
Yi Guo
Jiazhe Wang
Nana Cao
83
3
0
14 Jun 2023
Large Language Models Sometimes Generate Purely Negatively-Reinforced
  Text
Large Language Models Sometimes Generate Purely Negatively-Reinforced Text
Fabien Roger
SILM
43
0
0
13 Jun 2023
Robust Reinforcement Learning through Efficient Adversarial Herding
Robust Reinforcement Learning through Efficient Adversarial Herding
Juncheng Dong
Hao-Lun Hsu
Qitong Gao
Vahid Tarokh
Miroslav Pajic
81
4
0
12 Jun 2023
Defining and Explorting the Intelligence Space
Defining and Explorting the Intelligence Space
P. Rosenbloom
85
0
0
10 Jun 2023
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted
  Programming
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming
Hussein Mozannar
Gagan Bansal
Adam Fourney
Eric Horvitz
107
28
0
08 Jun 2023
Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
  Learning
Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning
Jaehyung Kim
Jinwoo Shin
Dongyeop Kang
64
2
0
08 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
120
157
0
07 Jun 2023
Turning large language models into cognitive models
Turning large language models into cognitive models
Marcel Binz
Eric Schulz
100
63
0
06 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language
  Model
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELMHILM
160
584
0
06 Jun 2023
Adaptive and Personalized Exercise Generation for Online Language
  Learning
Adaptive and Personalized Exercise Generation for Online Language Learning
Peng Cui
Mrinmaya Sachan
AI4Ed
94
23
0
04 Jun 2023
Previous
123...192021...242526
Next