Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.08365
Cited By
v1
v2
v3 (latest)
An Invitation to Deep Reinforcement Learning
13 December 2023
Bernhard Jaeger
Andreas Geiger
OffRL
OOD
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"An Invitation to Deep Reinforcement Learning"
50 / 108 papers shown
Title
CaRL: Learning Scalable Planning Policies with Simple Rewards
Bernhard Jaeger
D. Dauner
Jens Beißwenger
Simon Gerstenecker
Kashyap Chitta
Andreas Geiger
127
2
0
24 Apr 2025
Causally Aligned Curriculum Learning
Mingxuan Li
Junzhe Zhang
Elias Bareinboim
CML
102
4
0
21 Mar 2025
VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training
Mohammad Nazeri
Junzhe Wang
Amirreza Payandeh
Xuesu Xiao
SSL
ViT
101
8
0
12 Mar 2024
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Shafiq Joty
Nikhil Naik
EGVM
159
288
0
21 Nov 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
106
61
0
30 Sep 2023
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware
Nico Gürtler
Sebastian Blaes
Pavel Kolev
Felix Widmaier
Manuel Wüthrich
Stefan Bauer
Bernhard Schölkopf
Georg Martius
OffRL
100
30
0
28 Jul 2023
End-to-end Autonomous Driving: Challenges and Frontiers
Li Chen
Peng Wu
Kashyap Chitta
Bernhard Jaeger
Andreas Geiger
Hongyang Li
3DV
199
317
0
29 Jun 2023
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Max Schwarzer
J. Obando-Ceron
Rameswar Panda
Marc G. Bellemare
Rishabh Agarwal
Pablo Samuel Castro
OffRL
124
102
0
30 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
405
4,187
0
29 May 2023
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Ying Fan
Olivia Watkins
Yuqing Du
Hao Liu
Moonkyung Ryu
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
Kangwook Lee
Kimin Lee
167
167
0
25 May 2023
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
158
379
0
22 May 2023
A Tutorial Introduction to Reinforcement Learning
M. Vidyasagar
58
6
0
03 Apr 2023
Tuning computer vision models with task rewards
André Susano Pinto
Alexander Kolesnikov
Yuge Shi
Lucas Beyer
Xiaohua Zhai
VLM
85
41
0
16 Feb 2023
Mastering Diverse Domains through World Models
Danijar Hafner
J. Pašukonis
Jimmy Ba
Timothy Lillicrap
94
617
0
10 Jan 2023
General Intelligence Requires Rethinking Exploration
Minqi Jiang
Tim Rocktaschel
Edward Grefenstette
LRM
81
20
0
15 Nov 2022
Redeeming Intrinsic Rewards via Constrained Optimization
Eric Chen
Zhang-Wei Hong
Joni Pajarinen
Pulkit Agrawal
OnRL
111
27
0
14 Nov 2022
Human-level Atari 200x faster
Steven Kapturowski
Victor Campos
Ray Jiang
Nemanja Rakićević
Hado van Hasselt
Charles Blundell
Adria Puigdomenech Badia
OffRL
94
30
0
15 Sep 2022
Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
Edoardo Cetin
Philip J. Ball
Steve Roberts
Oya Celiktutan
112
38
0
03 Jul 2022
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Julien Perolat
Bart De Vylder
Daniel Hennes
Eugene Tarassov
Florian Strub
...
Rémi Munos
David Silver
Satinder Singh
Demis Hassabis
K. Tuyls
101
206
0
30 Jun 2022
The Phenomenon of Policy Churn
Tom Schaul
André Barreto
John Quan
Georg Ostrovski
89
28
0
01 Jun 2022
The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin
Max Schwarzer
P. DÓro
Pierre-Luc Bacon
Rameswar Panda
OnRL
150
197
0
16 May 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
262
2,630
0
12 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
936
13,282
0
04 Mar 2022
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems
Rafael Figueiredo Prudencio
Marcos R. O. A. Máximo
Esther Luna Colombini
OffRL
111
244
0
02 Mar 2022
Evolving Curricula with Regret-Based Environment Design
Jack Parker-Holder
Minqi Jiang
Michael Dennis
Mikayel Samvelyan
Jakob N. Foerster
Edward Grefenstette
Tim Rocktaschel
116
125
0
02 Mar 2022
Learning robust perceptive locomotion for quadrupedal robots in the wild
Takahiro Miki
Joonho Lee
Jemin Hwangbo
Lorenz Wellhausen
V. Koltun
Marco Hutter
137
716
0
20 Jan 2022
GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving
Raphael Chekroun
Marin Toromanoff
Sascha Hornauer
Fabien Moutarde
92
61
0
16 Nov 2021
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Rishabh Agarwal
Max Schwarzer
Pablo Samuel Castro
Aaron Courville
Marc G. Bellemare
OffRL
190
680
0
30 Aug 2021
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
Zhejun Zhang
Alexander Liniger
Dengxin Dai
Feng Yu
Luc Van Gool
116
211
0
18 Aug 2021
Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning
Denis Yarats
Rob Fergus
A. Lazaric
Lerrel Pinto
OffRL
133
353
0
20 Jul 2021
Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen
Kevin Lu
Aravind Rajeswaran
Kimin Lee
Aditya Grover
Michael Laskin
Pieter Abbeel
A. Srinivas
Igor Mordatch
OffRL
196
1,669
0
02 Jun 2021
MetricOpt: Learning to Optimize Black-Box Evaluation Metrics
Chen Huang
Shuangfei Zhai
Pengsheng Guo
J. Susskind
97
12
0
21 Apr 2021
Reinforcement Learning, Bit by Bit
Xiuyuan Lu
Benjamin Van Roy
Vikranth Dwaracherla
M. Ibrahimi
Ian Osband
Zheng Wen
126
70
0
06 Mar 2021
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Michael Dennis
Natasha Jaques
Eugene Vinitsky
Alexandre M. Bayen
Stuart J. Russell
Andrew Critch
Sergey Levine
114
237
0
03 Dec 2020
Exploring Simple Siamese Representation Learning
Xinlei Chen
Kaiming He
SSL
381
4,083
0
20 Nov 2020
Learning Quadrupedal Locomotion over Challenging Terrain
Joonho Lee
Jemin Hwangbo
Lorenz Wellhausen
V. Koltun
Marco Hutter
164
1,185
0
21 Oct 2020
Prioritized Level Replay
Minqi Jiang
Edward Grefenstette
Tim Rocktaschel
OffRL
128
160
0
08 Oct 2020
Mastering Atari with Discrete World Models
Danijar Hafner
Timothy Lillicrap
Mohammad Norouzi
Jimmy Ba
DRL
192
876
0
05 Oct 2020
Phasic Policy Gradient
K. Cobbe
Jacob Hilton
Oleg Klimov
John Schulman
OffRL
78
160
0
09 Sep 2020
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
304
2,195
0
02 Sep 2020
Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning
Aleksei Petrenko
Zhehui Huang
T. Kumar
Gaurav Sukhatme
V. Koltun
103
105
0
21 Jun 2020
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
Marcin Andrychowicz
Anton Raichuk
Piotr Stańczyk
Manu Orsini
Sertan Girgin
...
Matthieu Geist
Olivier Pietquin
Marcin Michalski
Sylvain Gelly
Olivier Bachem
OffRL
92
225
0
10 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.2K
42,712
0
28 May 2020
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
Logan Engstrom
Andrew Ilyas
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
L. Rudolph
Aleksander Madry
AAML
87
229
0
25 May 2020
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
582
2,051
0
04 May 2020
Reinforcement Learning with Augmented Data
Michael Laskin
Kimin Lee
Adam Stooke
Lerrel Pinto
Pieter Abbeel
A. Srinivas
OffRL
165
661
0
30 Apr 2020
Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels
Ilya Kostrikov
Denis Yarats
Rob Fergus
OffRL
183
794
0
28 Apr 2020
First return, then explore
Adrien Ecoffet
Joost Huizinga
Joel Lehman
Kenneth O. Stanley
Jeff Clune
108
365
0
27 Apr 2020
Shortcut Learning in Deep Neural Networks
Robert Geirhos
J. Jacobsen
Claudio Michaelis
R. Zemel
Wieland Brendel
Matthias Bethge
Felix Wichmann
231
2,073
0
16 Apr 2020
Agent57: Outperforming the Atari Human Benchmark
Adria Puigdomenech Badia
Bilal Piot
Steven Kapturowski
Pablo Sprechmann
Alex Vitvitskyi
Daniel Guo
Charles Blundell
OffRL
109
521
0
30 Mar 2020
1
2
3
Next