Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning
Platform

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

29 September 2023

Rujikorn Charakorn

Santiago Ontañón

ArXiv (abs)PDF HTML Github (112★)

Papers citing "Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform"

14 / 14 papers shown

Title
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Michael Noukhovitch Shengyi Huang Sophie Xhonneux Arian Hosseini Rishabh Agarwal Rameswar Panda OffRL 136 11 0 23 Oct 2024
Deep Reinforcement Learning at the Edge of the Statistical Precipice Rishabh Agarwal Max Schwarzer Pablo Samuel Castro Aaron Courville Marc G. Bellemare OffRL 118 673 0 30 Aug 2021
Podracer architectures for scalable Reinforcement Learning Matteo Hessel M. Kroiss Aidan Clark Iurii Kemaev John Quan Thomas Keck Fabio Viola H. V. Hasselt 54 39 0 13 Apr 2021
High-Throughput Synchronous Deep RL Iou-Jen Liu Raymond A. Yeh Alex Schwing OffRL 47 12 0 17 Dec 2020
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames Erik Wijmans Abhishek Kadian Ari S. Morcos Stefan Lee Irfan Essa Devi Parikh Manolis Savva Dhruv Batra 85 484 0 01 Nov 2019
An Empirical Model of Large-Batch Training Sam McCandlish Jared Kaplan Dario Amodei OpenAI Dota Team 67 278 0 14 Dec 2018
Accelerated Methods for Deep Reinforcement Learning Adam Stooke Pieter Abbeel OffRL OnRL 59 136 0 07 Mar 2018
Distributed Prioritized Experience Replay Dan Horgan John Quan David Budden Gabriel Barth-Maron Matteo Hessel H. V. Hasselt David Silver 147 741 0 02 Mar 2018
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures L. Espeholt Hubert Soyer Rémi Munos Karen Simonyan Volodymyr Mnih ... Vlad Firoiu Tim Harley Iain Dunning Shane Legg Koray Kavukcuoglu 220 1,605 0 05 Feb 2018
Deep Reinforcement Learning that Matters Peter Henderson Riashat Islam Philip Bachman Joelle Pineau Doina Precup David Meger OffRL 118 1,961 0 19 Sep 2017
Proximal Policy Optimization Algorithms John Schulman Filip Wolski Prafulla Dhariwal Alec Radford Oleg Klimov OffRL 517 19,237 0 20 Jul 2017
Asynchronous Methods for Deep Reinforcement Learning Volodymyr Mnih Adria Puigdomenech Badia M. Berk Mirza Alex Graves Timothy Lillicrap Tim Harley David Silver Koray Kavukcuoglu 202 8,875 0 04 Feb 2016
High-Dimensional Continuous Control Using Generalized Advantage Estimation John Schulman Philipp Moritz Sergey Levine Michael I. Jordan Pieter Abbeel OffRL 104 3,434 0 08 Jun 2015
The Arcade Learning Environment: An Evaluation Platform for General Agents Marc G. Bellemare Yavar Naddaf J. Veness Michael Bowling 120 3,020 0 19 Jul 2012