ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLM
    ALM
ArXivPDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 4,604 papers shown
Title
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Maarten Sap
Ronan Le Bras
Daniel Fried
Yejin Choi
27
207
0
24 Oct 2022
Realistic Data Augmentation Framework for Enhancing Tabular Reasoning
Realistic Data Augmentation Framework for Enhancing Tabular Reasoning
D. K. Santhosh Kumar
Vivek Gupta
Soumya Sharma
Shuo Zhang
LMTD
21
3
0
23 Oct 2022
Graphically Structured Diffusion Models
Graphically Structured Diffusion Models
Christian Weilbach
William Harvey
Frank D. Wood
DiffM
35
7
0
20 Oct 2022
Boosting Natural Language Generation from Instructions with
  Meta-Learning
Boosting Natural Language Generation from Instructions with Meta-Learning
Budhaditya Deb
Guoqing Zheng
Ahmed Hassan Awadallah
24
13
0
20 Oct 2022
Large Language Models Can Self-Improve
Large Language Models Can Self-Improve
Jiaxin Huang
S. Gu
Le Hou
Yuexin Wu
Xuezhi Wang
Hongkun Yu
Jiawei Han
ReLM
AI4MH
LRM
47
564
0
20 Oct 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
64
2,989
0
20 Oct 2022
Transcending Scaling Laws with 0.1% Extra Compute
Transcending Scaling Laws with 0.1% Extra Compute
Yi Tay
Jason W. Wei
Hyung Won Chung
Vinh Q. Tran
David R. So
...
Donald Metzler
Slav Petrov
N. Houlsby
Quoc V. Le
Mostafa Dehghani
LRM
44
68
0
20 Oct 2022
lo-fi: distributed fine-tuning without communication
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
32
24
0
19 Oct 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
475
0
19 Oct 2022
TabLLM: Few-shot Classification of Tabular Data with Large Language
  Models
TabLLM: Few-shot Classification of Tabular Data with Large Language Models
S. Hegselmann
Alejandro Buendia
Hunter Lang
Monica Agrawal
Xiaoyi Jiang
David Sontag
LMTD
55
211
0
19 Oct 2022
Revision Transformers: Instructing Language Models to Change their
  Values
Revision Transformers: Instructing Language Models to Change their Values
Felix Friedrich
Wolfgang Stammer
P. Schramowski
Kristian Kersting
KELM
33
6
0
19 Oct 2022
Aligning MAGMA by Few-Shot Learning and Finetuning
Aligning MAGMA by Few-Shot Learning and Finetuning
Jean-Charles Layoun
Alexis Roger
Irina Rish
VLM
22
2
0
18 Oct 2022
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Luke Vilnis
Yury Zemlyanskiy
Patrick C. Murray
Alexandre Passos
Sumit Sanghai
62
9
0
18 Oct 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
92
997
0
17 Oct 2022
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations
  On-the-Fly
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly
Yi Ren Fung
Tuhin Chakraborty
Hao Guo
Owen Rambow
Smaranda Muresan
Heng Ji
21
39
0
16 Oct 2022
The Debate Over Understanding in AI's Large Language Models
The Debate Over Understanding in AI's Large Language Models
Melanie Mitchell
D. Krakauer
ELM
74
203
0
14 Oct 2022
Large Language Models are few(1)-shot Table Reasoners
Large Language Models are few(1)-shot Table Reasoners
Wenhu Chen
LMTD
ReLM
LRM
22
138
0
13 Oct 2022
PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs (Journal Version)
PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs (Journal Version)
Hao Cui
R. Trimananda
A. Markopoulou
Scott Jordan
49
17
0
13 Oct 2022
EleutherAI: Going Beyond "Open Science" to "Science in the Open"
EleutherAI: Going Beyond "Open Science" to "Science in the Open"
Jason Phang
Herbie Bradley
Leo Gao
Louis Castricato
Stella Biderman
VLM
48
12
0
12 Oct 2022
SEAL : Interactive Tool for Systematic Error Analysis and Labeling
SEAL : Interactive Tool for Systematic Error Analysis and Labeling
Nazneen Rajani
Weixin Liang
Lingjiao Chen
Margaret Mitchell
James Zou
42
16
0
11 Oct 2022
Mind's Eye: Grounded Language Model Reasoning through Simulation
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
118
79
0
11 Oct 2022
Data-Efficiency with a Single GPU: An Exploration of Transfer Methods
  for Small Language Models
Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models
Alon Albalak
Akshat Shrivastava
Chinnadhurai Sankar
Adithya Sagar
Mike Ross
34
3
0
08 Oct 2022
Automatic Chain of Thought Prompting in Large Language Models
Automatic Chain of Thought Prompting in Large Language Models
Zhuosheng Zhang
Aston Zhang
Mu Li
Alexander J. Smola
ReLM
LRM
67
575
0
07 Oct 2022
Rainier: Reinforced Knowledge Introspector for Commonsense Question
  Answering
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering
Jiacheng Liu
Skyler Hallinan
Ximing Lu
Pengfei He
Sean Welleck
Hannaneh Hajishirzi
Yejin Choi
RALM
26
59
0
06 Oct 2022
Language Models are Multilingual Chain-of-Thought Reasoners
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
172
327
0
06 Oct 2022
Efficiently Enhancing Zero-Shot Performance of Instruction Following
  Model via Retrieval of Soft Prompt
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
Seonghyeon Ye
Joel Jang
Doyoung Kim
Yongrae Jo
Minjoon Seo
VLM
36
2
0
06 Oct 2022
Guess the Instruction! Flipped Learning Makes Language Models Stronger
  Zero-Shot Learners
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
Seonghyeon Ye
Doyoung Kim
Joel Jang
Joongbo Shin
Minjoon Seo
FedML
VLM
UQCV
LRM
19
25
0
06 Oct 2022
Large Language Models are Pretty Good Zero-Shot Video Game Bug Detectors
Large Language Models are Pretty Good Zero-Shot Video Game Bug Detectors
Mohammad Reza Taesiri
Finlay Macklon
Yihe Wang
Hengshuo Shen
C. Bezemer
ELM
LLMAG
MLLM
42
13
0
05 Oct 2022
Ask Me Anything: A simple strategy for prompting language models
Ask Me Anything: A simple strategy for prompting language models
Simran Arora
A. Narayan
Mayee F. Chen
Laurel J. Orr
Neel Guha
Kush S. Bhatia
Ines Chami
Frederic Sala
Christopher Ré
ReLM
LRM
220
208
0
05 Oct 2022
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Tushar Khot
H. Trivedi
Matthew Finlayson
Yao Fu
Kyle Richardson
Peter Clark
Ashish Sabharwal
ReLM
LRM
70
416
0
05 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For
  Correct Goals
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Rohin Shah
Vikrant Varma
Ramana Kumar
Mary Phuong
Victoria Krakovna
J. Uesato
Zachary Kenton
37
68
0
04 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
31
239
0
03 Oct 2022
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
  Chain-of-Thought
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
121
275
0
03 Oct 2022
Dynamic Prompt Learning via Policy Gradient for Semi-structured
  Mathematical Reasoning
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Pan Lu
Liang Qiu
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Tanmay Rajpurohit
Peter Clark
A. Kalyan
ReLM
LRM
49
267
0
29 Sep 2022
Bidirectional Language Models Are Also Few-shot Learners
Bidirectional Language Models Are Also Few-shot Learners
Ajay Patel
Bryan Li
Mohammad Sadegh Rasooli
Noah Constant
Colin Raffel
Chris Callison-Burch
LRM
70
45
0
29 Sep 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
Argumentative Reward Learning: Reasoning About Human Preferences
Argumentative Reward Learning: Reasoning About Human Preferences
Francis Rhys Ward
Francesco Belardinelli
Francesca Toni
HAI
92
2
0
28 Sep 2022
Generate rather than Retrieve: Large Language Models are Strong Context
  Generators
Generate rather than Retrieve: Large Language Models are Strong Context Generators
W. Yu
Dan Iter
Shuohang Wang
Yichong Xu
Mingxuan Ju
Soumya Sanyal
Chenguang Zhu
Michael Zeng
Meng Jiang
RALM
AIMat
229
321
0
21 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
211
1,106
0
20 Sep 2022
Does CLIP Know My Face?
Does CLIP Know My Face?
Dominik Hintersdorf
Lukas Struppek
Manuel Brack
Felix Friedrich
P. Schramowski
Kristian Kersting
VLM
21
9
0
15 Sep 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial
  Intelligence with Humans
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
John J. Nay
ELM
AILaw
88
27
0
14 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Tengjiao Wang
Ming-Hsuan Yang
DiffM
MedIm
224
1,304
0
02 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua-Hong Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
The Alignment Problem from a Deep Learning Perspective
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
59
183
0
30 Aug 2022
PEER: A Collaborative Language Model
PEER: A Collaborative Language Model
Timo Schick
Jane Dwivedi-Yu
Zhengbao Jiang
Fabio Petroni
Patrick Lewis
Gautier Izacard
Qingfei You
Christoforos Nalmpantis
Edouard Grave
Sebastian Riedel
ALM
50
93
0
24 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
446
0
23 Aug 2022
Using Large Language Models to Simulate Multiple Humans and Replicate
  Human Subject Studies
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
Gati Aher
RosaI. Arriaga
Adam Tauman Kalai
59
349
0
18 Aug 2022
Abstractive Meeting Summarization: A Survey
Abstractive Meeting Summarization: A Survey
Virgile Rennard
Guokan Shang
Julie Hunter
Michalis Vazirgiannis
34
15
0
08 Aug 2022
Learning New Skills after Deployment: Improving open-domain
  internet-driven dialogue with human feedback
Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback
Jing Xu
Megan Ung
M. Komeili
Kushal Arora
Y-Lan Boureau
Jason Weston
27
37
0
05 Aug 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq
  Model
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gokhan Tur
Premkumar Natarajan
54
82
0
02 Aug 2022
Previous
123...90919293
Next