ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLM
    ALM
ArXivPDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 7,271 papers shown
Title
Will we run out of data? Limits of LLM scaling based on human-generated
  data
Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos
A. Ho
J. Sevilla
T. Besiroglu
Lennart Heim
Marius Hobbhahn
ALM
49
111
0
26 Oct 2022
RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question
  Answering
RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering
Victor Zhong
Weijia Shi
Wen-tau Yih
Luke Zettlemoyer
17
19
0
25 Oct 2022
Help me write a poem: Instruction Tuning as a Vehicle for Collaborative
  Poetry Writing
Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing
Tuhin Chakrabarty
Vishakh Padmakumar
Hengxing He
21
72
0
25 Oct 2022
Reinforcement Learning and Bandits for Speech and Language Processing:
  Tutorial, Review and Outlook
Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook
Baihan Lin
OffRL
AI4TS
32
27
0
24 Oct 2022
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Maarten Sap
Ronan Le Bras
Daniel Fried
Yejin Choi
27
210
0
24 Oct 2022
Realistic Data Augmentation Framework for Enhancing Tabular Reasoning
Realistic Data Augmentation Framework for Enhancing Tabular Reasoning
D. K. Santhosh Kumar
Vivek Gupta
Soumya Sharma
Shuo Zhang
LMTD
21
3
0
23 Oct 2022
Graphically Structured Diffusion Models
Graphically Structured Diffusion Models
Christian D. Weilbach
William Harvey
Frank Wood
DiffM
40
7
0
20 Oct 2022
Boosting Natural Language Generation from Instructions with
  Meta-Learning
Boosting Natural Language Generation from Instructions with Meta-Learning
Budhaditya Deb
Guoqing Zheng
Ahmed Hassan Awadallah
24
13
0
20 Oct 2022
Large Language Models Can Self-Improve
Large Language Models Can Self-Improve
Jiaxin Huang
S. Gu
Le Hou
Yuexin Wu
Xuezhi Wang
Hongkun Yu
Jiawei Han
ReLM
AI4MH
LRM
47
568
0
20 Oct 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
103
3,012
0
20 Oct 2022
Transcending Scaling Laws with 0.1% Extra Compute
Transcending Scaling Laws with 0.1% Extra Compute
Yi Tay
Jason W. Wei
Hyung Won Chung
Vinh Q. Tran
David R. So
...
Donald Metzler
Slav Petrov
N. Houlsby
Quoc V. Le
Mostafa Dehghani
LRM
47
68
0
20 Oct 2022
lo-fi: distributed fine-tuning without communication
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
34
24
0
19 Oct 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
489
0
19 Oct 2022
TabLLM: Few-shot Classification of Tabular Data with Large Language
  Models
TabLLM: Few-shot Classification of Tabular Data with Large Language Models
S. Hegselmann
Alejandro Buendia
Hunter Lang
Monica Agrawal
Xiaoyi Jiang
David Sontag
LMTD
57
213
0
19 Oct 2022
Revision Transformers: Instructing Language Models to Change their
  Values
Revision Transformers: Instructing Language Models to Change their Values
Felix Friedrich
Wolfgang Stammer
P. Schramowski
Kristian Kersting
KELM
33
6
0
19 Oct 2022
Aligning MAGMA by Few-Shot Learning and Finetuning
Aligning MAGMA by Few-Shot Learning and Finetuning
Jean-Charles Layoun
Alexis Roger
Irina Rish
VLM
27
2
0
18 Oct 2022
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Luke Vilnis
Yury Zemlyanskiy
Patrick C. Murray
Alexandre Passos
Sumit Sanghai
62
9
0
18 Oct 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
121
1,023
0
17 Oct 2022
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations
  On-the-Fly
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly
Yi R. Fung
Tuhin Chakraborty
Hao Guo
Owen Rambow
Smaranda Muresan
Heng Ji
21
39
0
16 Oct 2022
The Debate Over Understanding in AI's Large Language Models
The Debate Over Understanding in AI's Large Language Models
Melanie Mitchell
D. Krakauer
ELM
74
203
0
14 Oct 2022
Large Language Models are few(1)-shot Table Reasoners
Large Language Models are few(1)-shot Table Reasoners
Wenhu Chen
LMTD
ReLM
LRM
22
139
0
13 Oct 2022
PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs (Journal Version)
PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs (Journal Version)
Hao Cui
R. Trimananda
A. Markopoulou
Scott Jordan
57
17
0
13 Oct 2022
EleutherAI: Going Beyond "Open Science" to "Science in the Open"
EleutherAI: Going Beyond "Open Science" to "Science in the Open"
Jason Phang
Herbie Bradley
Leo Gao
Louis Castricato
Stella Biderman
VLM
56
12
0
12 Oct 2022
SEAL : Interactive Tool for Systematic Error Analysis and Labeling
SEAL : Interactive Tool for Systematic Error Analysis and Labeling
Nazneen Rajani
Weixin Liang
Lingjiao Chen
Margaret Mitchell
James Zou
48
16
0
11 Oct 2022
Mind's Eye: Grounded Language Model Reasoning through Simulation
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
121
80
0
11 Oct 2022
Investigating the Failure Modes of the AUC metric and Exploring
  Alternatives for Evaluating Systems in Safety Critical Applications
Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications
Swaroop Mishra
Anjana Arunkumar
Chitta Baral
33
0
0
10 Oct 2022
Data-Efficiency with a Single GPU: An Exploration of Transfer Methods
  for Small Language Models
Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models
Alon Albalak
Akshat Shrivastava
Chinnadhurai Sankar
Adithya Sagar
Mike Ross
40
3
0
08 Oct 2022
Automatic Chain of Thought Prompting in Large Language Models
Automatic Chain of Thought Prompting in Large Language Models
ZhuoSheng Zhang
Aston Zhang
Mu Li
Alexander J. Smola
ReLM
LRM
67
584
0
07 Oct 2022
Rainier: Reinforced Knowledge Introspector for Commonsense Question
  Answering
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering
Jiacheng Liu
Skyler Hallinan
Ximing Lu
Pengfei He
Sean Welleck
Hannaneh Hajishirzi
Yejin Choi
RALM
29
59
0
06 Oct 2022
Language Models are Multilingual Chain-of-Thought Reasoners
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
174
335
0
06 Oct 2022
Efficiently Enhancing Zero-Shot Performance of Instruction Following
  Model via Retrieval of Soft Prompt
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
Seonghyeon Ye
Joel Jang
Doyoung Kim
Yongrae Jo
Minjoon Seo
VLM
39
2
0
06 Oct 2022
Guess the Instruction! Flipped Learning Makes Language Models Stronger
  Zero-Shot Learners
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
Seonghyeon Ye
Doyoung Kim
Joel Jang
Joongbo Shin
Minjoon Seo
FedML
VLM
UQCV
LRM
24
25
0
06 Oct 2022
Large Language Models are Pretty Good Zero-Shot Video Game Bug Detectors
Large Language Models are Pretty Good Zero-Shot Video Game Bug Detectors
Mohammad Reza Taesiri
Finlay Macklon
Yihe Wang
Hengshuo Shen
C. Bezemer
ELM
LLMAG
MLLM
47
13
0
05 Oct 2022
Ask Me Anything: A simple strategy for prompting language models
Ask Me Anything: A simple strategy for prompting language models
Simran Arora
A. Narayan
Mayee F. Chen
Laurel J. Orr
Neel Guha
Kush S. Bhatia
Ines Chami
Frederic Sala
Christopher Ré
ReLM
LRM
235
208
0
05 Oct 2022
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Tushar Khot
H. Trivedi
Matthew Finlayson
Yao Fu
Kyle Richardson
Peter Clark
Ashish Sabharwal
ReLM
LRM
70
420
0
05 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For
  Correct Goals
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Rohin Shah
Vikrant Varma
Ramana Kumar
Mary Phuong
Victoria Krakovna
J. Uesato
Zachary Kenton
40
68
0
04 Oct 2022
When to Make Exceptions: Exploring Language Models as Accounts of Human
  Moral Judgment
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
Zhijing Jin
Sydney Levine
Fernando Gonzalez
Ojasv Kamal
Maarten Sap
Mrinmaya Sachan
Rada Mihalcea
J. Tenenbaum
Bernhard Schölkopf
ELM
LRM
34
90
0
04 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
31
240
0
03 Oct 2022
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
  Chain-of-Thought
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
123
282
0
03 Oct 2022
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple
  Tasks
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks
Zhenhailong Wang
Xiaoman Pan
Dian Yu
Dong Yu
Jianshu Chen
Heng Ji
VLM
46
9
0
01 Oct 2022
Dynamic Prompt Learning via Policy Gradient for Semi-structured
  Mathematical Reasoning
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Pan Lu
Liang Qiu
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Tanmay Rajpurohit
Peter Clark
Ashwin Kalyan
ReLM
LRM
61
269
0
29 Sep 2022
Bidirectional Language Models Are Also Few-shot Learners
Bidirectional Language Models Are Also Few-shot Learners
Ajay Patel
Bryan Li
Mohammad Sadegh Rasooli
Noah Constant
Colin Raffel
Chris Callison-Burch
LRM
70
45
0
29 Sep 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
239
506
0
28 Sep 2022
Argumentative Reward Learning: Reasoning About Human Preferences
Argumentative Reward Learning: Reasoning About Human Preferences
Francis Rhys Ward
Francesco Belardinelli
Francesca Toni
HAI
92
2
0
28 Sep 2022
Generate rather than Retrieve: Large Language Models are Strong Context
  Generators
Generate rather than Retrieve: Large Language Models are Strong Context Generators
Wenhao Yu
Dan Iter
Shuohang Wang
Yichong Xu
Mingxuan Ju
Soumya Sanyal
Chenguang Zhu
Michael Zeng
Meng Jiang
RALM
AIMat
240
323
0
21 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,124
0
20 Sep 2022
Does CLIP Know My Face?
Does CLIP Know My Face?
Dominik Hintersdorf
Lukas Struppek
Manuel Brack
Felix Friedrich
P. Schramowski
Kristian Kersting
VLM
21
9
0
15 Sep 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial
  Intelligence with Humans
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
John J. Nay
ELM
AILaw
88
27
0
14 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Bin Cui
Ming-Hsuan Yang
DiffM
MedIm
226
1,314
0
02 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
Previous
123...143144145146
Next