ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLM
    ALM
ArXivPDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 4,678 papers shown
Title
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
31
240
0
03 Oct 2022
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
  Chain-of-Thought
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
123
277
0
03 Oct 2022
Dynamic Prompt Learning via Policy Gradient for Semi-structured
  Mathematical Reasoning
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Pan Lu
Liang Qiu
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Tanmay Rajpurohit
Peter Clark
Ashwin Kalyan
ReLM
LRM
55
269
0
29 Sep 2022
Bidirectional Language Models Are Also Few-shot Learners
Bidirectional Language Models Are Also Few-shot Learners
Ajay Patel
Bryan Li
Mohammad Sadegh Rasooli
Noah Constant
Colin Raffel
Chris Callison-Burch
LRM
70
45
0
29 Sep 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
230
506
0
28 Sep 2022
Argumentative Reward Learning: Reasoning About Human Preferences
Argumentative Reward Learning: Reasoning About Human Preferences
Francis Rhys Ward
Francesco Belardinelli
Francesca Toni
HAI
92
2
0
28 Sep 2022
Generate rather than Retrieve: Large Language Models are Strong Context
  Generators
Generate rather than Retrieve: Large Language Models are Strong Context Generators
W. Yu
Dan Iter
Shuohang Wang
Yichong Xu
Mingxuan Ju
Soumya Sanyal
Chenguang Zhu
Michael Zeng
Meng Jiang
RALM
AIMat
235
322
0
21 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,113
0
20 Sep 2022
Does CLIP Know My Face?
Does CLIP Know My Face?
Dominik Hintersdorf
Lukas Struppek
Manuel Brack
Felix Friedrich
P. Schramowski
Kristian Kersting
VLM
21
9
0
15 Sep 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial
  Intelligence with Humans
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
John J. Nay
ELM
AILaw
88
27
0
14 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Bin Cui
Ming-Hsuan Yang
DiffM
MedIm
224
1,311
0
02 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
The Alignment Problem from a Deep Learning Perspective
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
68
183
0
30 Aug 2022
PEER: A Collaborative Language Model
PEER: A Collaborative Language Model
Timo Schick
Jane Dwivedi-Yu
Zhengbao Jiang
Fabio Petroni
Patrick Lewis
Gautier Izacard
Qingfei You
Christoforos Nalmpantis
Edouard Grave
Sebastian Riedel
ALM
50
93
0
24 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
231
446
0
23 Aug 2022
Using Large Language Models to Simulate Multiple Humans and Replicate
  Human Subject Studies
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
Gati Aher
RosaI. Arriaga
Adam Tauman Kalai
59
350
0
18 Aug 2022
Abstractive Meeting Summarization: A Survey
Abstractive Meeting Summarization: A Survey
Virgile Rennard
Guokan Shang
Julie Hunter
Michalis Vazirgiannis
37
15
0
08 Aug 2022
Learning New Skills after Deployment: Improving open-domain
  internet-driven dialogue with human feedback
Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback
Jing Xu
Megan Ung
M. Komeili
Kushal Arora
Y-Lan Boureau
Jason Weston
30
37
0
05 Aug 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq
  Model
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gokhan Tur
Premkumar Natarajan
56
82
0
02 Aug 2022
SMART: Sentences as Basic Units for Text Evaluation
SMART: Sentences as Basic Units for Text Evaluation
Reinald Kim Amplayo
Peter J. Liu
Yao-Min Zhao
Shashi Narayan
32
21
0
01 Aug 2022
Few-shot Adaptation Works with UnpredicTable Data
Few-shot Adaptation Works with UnpredicTable Data
Jun Shern Chan
Michael Pieler
Jonathan Jao
Jérémy Scheurer
Ethan Perez
31
5
0
01 Aug 2022
Can large language models reason about medical questions?
Can large language models reason about medical questions?
Valentin Liévin
C. Hother
Andreas Geert Motzfeldt
Ole Winther
ELM
LM&MA
AI4MH
LRM
29
300
0
17 Jul 2022
Language models show human-like content effects on reasoning tasks
Language models show human-like content effects on reasoning tasks
Ishita Dasgupta
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Hannah R. Sheahan
Antonia Creswell
D. Kumaran
James L. McClelland
Felix Hill
ReLM
LRM
30
181
0
14 Jul 2022
What is Flagged in Uncertainty Quantification? Latent Density Models for
  Uncertainty Categorization
What is Flagged in Uncertainty Quantification? Latent Density Models for Uncertainty Categorization
Hao Sun
B. V. Breugel
Jonathan Crabbé
Nabeel Seedat
M. Schaar
29
4
0
11 Jul 2022
Big Learning
Big Learning
Yulai Cong
Miaoyun Zhao
AI4CE
32
0
0
08 Jul 2022
Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback
Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback
Adhyyan Narang
Omid Sadeghi
Lillian J. Ratliff
Maryam Fazel
J. Bilmes
OffRL
15
1
0
07 Jul 2022
BioTABQA: Instruction Learning for Biomedical Table Question Answering
BioTABQA: Instruction Learning for Biomedical Table Question Answering
Man Luo
S. Saxena
Swaroop Mishra
Mihir Parmar
Chitta Baral
LMTD
157
15
0
06 Jul 2022
CodeRL: Mastering Code Generation through Pretrained Models and Deep
  Reinforcement Learning
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
Guosheng Lin
SyDa
ALM
135
240
0
05 Jul 2022
Rationale-Augmented Ensembles in Language Models
Rationale-Augmented Ensembles in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Denny Zhou
ReLM
LRM
35
124
0
02 Jul 2022
PlanBench: An Extensible Benchmark for Evaluating Large Language Models
  on Planning and Reasoning about Change
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
Karthik Valmeekam
Matthew Marquez
Alberto Olmo
S. Sreedharan
Subbarao Kambhampati
ReLM
LRM
27
197
0
21 Jun 2022
Interactive Visual Reasoning under Uncertainty
Interactive Visual Reasoning under Uncertainty
Manjie Xu
Guangyuan Jiang
Wei Liang
Song-Chun Zhu
Yixin Zhu
LRM
47
5
0
18 Jun 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
78
2,354
0
15 Jun 2022
Language Models are General-Purpose Interfaces
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
30
96
0
13 Jun 2022
X-Risk Analysis for AI Research
X-Risk Analysis for AI Research
Dan Hendrycks
Mantas Mazeika
35
68
0
13 Jun 2022
Offline RL for Natural Language Generation with Implicit Language Q
  Learning
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
133
102
0
05 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
25
51
0
01 Jun 2022
Leveraging Pre-Trained Language Models to Streamline Natural Language
  Interaction for Self-Tracking
Leveraging Pre-Trained Language Models to Streamline Natural Language Interaction for Self-Tracking
Young-Ho Kim
Sungdong Kim
Minsuk Chang
Sang-Woo Lee
46
4
0
31 May 2022
Can Foundation Models Help Us Achieve Perfect Secrecy?
Can Foundation Models Help Us Achieve Perfect Secrecy?
Simran Arora
Christopher Ré
FedML
24
6
0
27 May 2022
Quark: Controllable Text Generation with Reinforced Unlearning
Quark: Controllable Text Generation with Reinforced Unlearning
Ximing Lu
Sean Welleck
Jack Hessel
Liwei Jiang
Lianhui Qin
Peter West
Prithviraj Ammanabrolu
Yejin Choi
MU
66
206
0
26 May 2022
Ground-Truth Labels Matter: A Deeper Look into Input-Label
  Demonstrations
Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations
Kang Min Yoo
Junyeob Kim
Hyuhng Joon Kim
Hyunsoo Cho
Hwiyeol Jo
Sang-Woo Lee
Sang-goo Lee
Taeuk Kim
31
123
0
25 May 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue
  through Instruction Tuning
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
Prakhar Gupta
Cathy Jiao
Yi-Ting Yeh
Shikib Mehri
M. Eskénazi
Jeffrey P. Bigham
ALM
44
47
0
25 May 2022
QAMPARI: An Open-domain Question Answering Benchmark for Questions with
  Many Answers from Multiple Paragraphs
QAMPARI: An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
S. Amouyal
Tomer Wolfson
Ohad Rubin
Ori Yoran
Jonathan Herzig
Jonathan Berant
RALM
VLM
24
21
0
25 May 2022
Is a Question Decomposition Unit All We Need?
Is a Question Decomposition Unit All We Need?
Pruthvi H. Patel
Swaroop Mishra
Mihir Parmar
Chitta Baral
ReLM
158
51
0
25 May 2022
Non-Programmers Can Label Programs Indirectly via Active Examples: A
  Case Study with Text-to-SQL
Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jason Eisner
21
8
0
25 May 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,077
0
24 May 2022
RankGen: Improving Text Generation with Large Ranking Models
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
24
68
0
19 May 2022
A Generalist Agent
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
71
791
0
12 May 2022
UL2: Unifying Language Learning Paradigms
UL2: Unifying Language Learning Paradigms
Yi Tay
Mostafa Dehghani
Vinh Q. Tran
Xavier Garcia
Jason W. Wei
...
Tal Schuster
H. Zheng
Denny Zhou
N. Houlsby
Donald Metzler
AI4CE
59
297
0
10 May 2022
Language Models in the Loop: Incorporating Prompting into Weak
  Supervision
Language Models in the Loop: Incorporating Prompting into Weak Supervision
Ryan Smith
Jason Alan Fries
Braden Hancock
Stephen H. Bach
53
53
0
04 May 2022
Improving In-Context Few-Shot Learning via Self-Supervised Training
Improving In-Context Few-Shot Learning via Self-Supervised Training
Mingda Chen
Jingfei Du
Ramakanth Pasunuru
Todor Mihaylov
Srini Iyer
Ves Stoyanov
Zornitsa Kozareva
SSL
AI4MH
38
64
0
03 May 2022
Previous
123...929394
Next