ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.05206
  4. Cited By
The Wisdom of Hindsight Makes Language Models Better Instruction
  Followers

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

10 February 2023
Tianjun Zhang
Fangchen Liu
Justin Wong
Pieter Abbeel
Joseph E. Gonzalez
ArXivPDFHTML

Papers citing "The Wisdom of Hindsight Makes Language Models Better Instruction Followers"

27 / 27 papers shown
Title
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
Xinyan Chen
Jiaxin Ge
Tianjun Zhang
Jiaming Liu
Shanghang Zhang
VLM
EGVM
92
0
0
23 Dec 2023
In-context Reinforcement Learning with Algorithm Distillation
In-context Reinforcement Learning with Algorithm Distillation
Michael Laskin
Luyu Wang
Junhyuk Oh
Emilio Parisotto
Stephen Spencer
...
Ethan A. Brooks
Maxime Gazeau
Himanshu Sahni
Satinder Singh
Volodymyr Mnih
OffRL
59
129
0
25 Oct 2022
Large Language Models Can Self-Improve
Large Language Models Can Self-Improve
Jiaxin Huang
S. Gu
Le Hou
Yuexin Wu
Xuezhi Wang
Hongkun Yu
Jiawei Han
ReLM
AI4MH
LRM
177
608
0
20 Oct 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
181
3,117
0
20 Oct 2022
Efficient Planning in a Compact Latent Action Space
Efficient Planning in a Compact Latent Action Space
Zhengyao Jiang
Tianjun Zhang
Michael Janner
Yueying Li
Tim Rocktaschel
Edward Grefenstette
Yuandong Tian
OffRL
73
39
0
22 Aug 2022
Solving Quantitative Reasoning Problems with Language Models
Solving Quantitative Reasoning Problems with Language Models
Aitor Lewkowycz
Anders Andreassen
David Dohan
Ethan Dyer
Henryk Michalewski
...
Theo Gutman-Solo
Yuhuai Wu
Behnam Neyshabur
Guy Gur-Ari
Vedant Misra
ReLM
ELM
LRM
172
833
0
29 Jun 2022
Contrastive Learning as Goal-Conditioned Reinforcement Learning
Contrastive Learning as Goal-Conditioned Reinforcement Learning
Benjamin Eysenbach
Tianjun Zhang
Ruslan Salakhutdinov
Sergey Levine
SSL
OffRL
82
159
0
15 Jun 2022
Online Decision Transformer
Online Decision Transformer
Qinqing Zheng
Amy Zhang
Aditya Grover
OffRL
67
209
0
11 Feb 2022
Competition-Level Code Generation with AlphaCode
Competition-Level Code Generation with AlphaCode
Yujia Li
David Choi
Junyoung Chung
Nate Kushman
Julian Schrittwieser
...
Esme Sutherland Robson
Pushmeet Kohli
Nando de
Koray Kavukcuoglu
Oriol Vinyals
118
1,379
0
08 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
808
9,351
0
28 Jan 2022
Ethical and social risks of harm from Language Models
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
108
1,034
0
08 Dec 2021
A General Language Assistant as a Laboratory for Alignment
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
118
779
0
01 Dec 2021
Recursively Summarizing Books with Human Feedback
Recursively Summarizing Books with Human Feedback
Jeff Wu
Long Ouyang
Daniel M. Ziegler
Nissan Stiennon
Ryan J. Lowe
Jan Leike
Paul Christiano
ALM
154
303
0
22 Sep 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
224
5,518
0
07 Jul 2021
Offline Reinforcement Learning as One Big Sequence Modeling Problem
Offline Reinforcement Learning as One Big Sequence Modeling Problem
Michael Janner
Qiyang Li
Sergey Levine
OffRL
126
677
0
03 Jun 2021
True Few-Shot Learning with Language Models
True Few-Shot Learning with Language Models
Ethan Perez
Douwe Kiela
Kyunghyun Cho
126
437
0
24 May 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
558
4,036
0
18 Apr 2021
GPT Understands, Too
GPT Understands, Too
Xiao Liu
Yanan Zheng
Zhengxiao Du
Ming Ding
Yujie Qian
Zhilin Yang
Jie Tang
VLM
165
1,173
0
18 Mar 2021
Conservative Q-Learning for Offline Reinforcement Learning
Conservative Q-Learning for Offline Reinforcement Learning
Aviral Kumar
Aurick Zhou
George Tucker
Sergey Levine
OffRL
OnRL
137
1,812
0
08 Jun 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
463
1,727
0
18 Sep 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
256
2,312
0
02 May 2019
Scalable agent alignment via reward modeling: a research direction
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
88
413
0
19 Nov 2018
HG-DAgger: Interactive Imitation Learning with Human Experts
HG-DAgger: Interactive Imitation Learning with Human Experts
Michael Kelly
Chelsea Sidrane
Katherine Driggs-Campbell
Mykel J. Kochenderfer
OffRL
223
228
0
05 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,154
0
20 Apr 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
493
19,019
0
20 Jul 2017
Hindsight Experience Replay
Hindsight Experience Replay
Marcin Andrychowicz
Dwight Crow
Alex Ray
Jonas Schneider
Rachel Fong
Peter Welinder
Bob McGrew
Joshua Tobin
Pieter Abbeel
Wojciech Zaremba
OffRL
248
2,326
0
05 Jul 2017
Program Induction by Rationale Generation : Learning to Solve and
  Explain Algebraic Word Problems
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Wang Ling
Dani Yogatama
Chris Dyer
Phil Blunsom
AIMat
79
728
0
11 May 2017
1