Chain of Thought Imitation with Procedure Cloning

Chain of Thought Imitation with Procedure Cloning

22 May 2022

Dale Schuurmans

Pieter Abbeel

ArXiv (abs)PDF HTML

Papers citing "Chain of Thought Imitation with Procedure Cloning"

12 / 62 papers shown

Title
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World Joshua Tobin Rachel Fong Alex Ray Jonas Schneider Wojciech Zaremba Pieter Abbeel 259 2,972 0 20 Mar 2017
Minimax Regret Bounds for Reinforcement Learning M. G. Azar Ian Osband Rémi Munos 92 778 0 16 Mar 2017
Reinforcement Learning with Unsupervised Auxiliary Tasks Max Jaderberg Volodymyr Mnih Wojciech M. Czarnecki Tom Schaul Joel Z Leibo David Silver Koray Kavukcuoglu SSL 111 1,229 0 16 Nov 2016
Learning to Navigate in Complex Environments Piotr Wojciech Mirowski Razvan Pascanu Fabio Viola Hubert Soyer Andy Ballard ... Ross Goroshin Laurent Sifre Koray Kavukcuoglu D. Kumaran R. Hadsell 107 880 0 11 Nov 2016
Playing FPS Games with Deep Reinforcement Learning Guillaume Lample Devendra Singh Chaplot OffRL EgoV 89 587 0 18 Sep 2016
Value Iteration Networks Aviv Tamar Yi Wu G. Thomas Sergey Levine Pieter Abbeel 79 654 0 09 Feb 2016
Neural Programmer-Interpreters Scott E. Reed Nando de Freitas 101 410 0 19 Nov 2015
Recurrent Reinforcement Learning: A Hybrid Approach Xiujun Li Lihong Li Jianfeng Gao Xiaodong He Jianshu Chen Li Deng Ji He OffRL 64 77 0 10 Sep 2015
Massively Parallel Methods for Deep Reinforcement Learning Arun Nair Praveen Srinivasan Sam Blackwell Cagdas Alcicek Rory Fearon ... Stig Petersen Shane Legg Volodymyr Mnih Koray Kavukcuoglu David Silver OffRL AI4CE GNN 102 504 0 15 Jul 2015
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal Daniel J. Hsu Satyen Kale John Langford Lihong Li Robert Schapire OffRL 410 510 0 04 Feb 2014
The Arcade Learning Environment: An Evaluation Platform for General Agents Marc G. Bellemare Yavar Naddaf J. Veness Michael Bowling 120 3,021 0 19 Jul 2012
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning Stéphane Ross Geoffrey J. Gordon J. Andrew Bagnell OffRL 244 3,233 0 02 Nov 2010