ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.08365
  4. Cited By
An Invitation to Deep Reinforcement Learning
v1v2v3 (latest)

An Invitation to Deep Reinforcement Learning

13 December 2023
Bernhard Jaeger
Andreas Geiger
    OffRLOOD
ArXiv (abs)PDFHTML

Papers citing "An Invitation to Deep Reinforcement Learning"

50 / 108 papers shown
Title
CaRL: Learning Scalable Planning Policies with Simple Rewards
CaRL: Learning Scalable Planning Policies with Simple Rewards
Bernhard Jaeger
D. Dauner
Jens Beißwenger
Simon Gerstenecker
Kashyap Chitta
Andreas Geiger
127
2
0
24 Apr 2025
Causally Aligned Curriculum Learning
Causally Aligned Curriculum Learning
Mingxuan Li
Junzhe Zhang
Elias Bareinboim
CML
102
4
0
21 Mar 2025
VANP: Learning Where to See for Navigation with Self-Supervised
  Vision-Action Pre-Training
VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training
Mohammad Nazeri
Junzhe Wang
Amirreza Payandeh
Xuesu Xiao
SSLViT
101
8
0
12 Mar 2024
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Shafiq Joty
Nikhil Naik
EGVM
159
288
0
21 Nov 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for
  LLM Alignment
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
106
61
0
30 Sep 2023
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware
Nico Gürtler
Sebastian Blaes
Pavel Kolev
Felix Widmaier
Manuel Wüthrich
Stefan Bauer
Bernhard Schölkopf
Georg Martius
OffRL
100
30
0
28 Jul 2023
End-to-end Autonomous Driving: Challenges and Frontiers
End-to-end Autonomous Driving: Challenges and Frontiers
Li Chen
Peng Wu
Kashyap Chitta
Bernhard Jaeger
Andreas Geiger
Hongyang Li
3DV
199
317
0
29 Jun 2023
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Max Schwarzer
J. Obando-Ceron
Rameswar Panda
Marc G. Bellemare
Rishabh Agarwal
Pablo Samuel Castro
OffRL
124
102
0
30 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
405
4,187
0
29 May 2023
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion
  Models
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Ying Fan
Olivia Watkins
Yuqing Du
Hao Liu
Moonkyung Ryu
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
Kangwook Lee
Kimin Lee
167
167
0
25 May 2023
Training Diffusion Models with Reinforcement Learning
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
158
379
0
22 May 2023
A Tutorial Introduction to Reinforcement Learning
A Tutorial Introduction to Reinforcement Learning
M. Vidyasagar
58
6
0
03 Apr 2023
Tuning computer vision models with task rewards
Tuning computer vision models with task rewards
André Susano Pinto
Alexander Kolesnikov
Yuge Shi
Lucas Beyer
Xiaohua Zhai
VLM
85
41
0
16 Feb 2023
Mastering Diverse Domains through World Models
Mastering Diverse Domains through World Models
Danijar Hafner
J. Pašukonis
Jimmy Ba
Timothy Lillicrap
94
617
0
10 Jan 2023
General Intelligence Requires Rethinking Exploration
General Intelligence Requires Rethinking Exploration
Minqi Jiang
Tim Rocktaschel
Edward Grefenstette
LRM
81
20
0
15 Nov 2022
Redeeming Intrinsic Rewards via Constrained Optimization
Redeeming Intrinsic Rewards via Constrained Optimization
Eric Chen
Zhang-Wei Hong
Joni Pajarinen
Pulkit Agrawal
OnRL
111
27
0
14 Nov 2022
Human-level Atari 200x faster
Human-level Atari 200x faster
Steven Kapturowski
Victor Campos
Ray Jiang
Nemanja Rakićević
Hado van Hasselt
Charles Blundell
Adria Puigdomenech Badia
OffRL
94
30
0
15 Sep 2022
Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
Edoardo Cetin
Philip J. Ball
Steve Roberts
Oya Celiktutan
112
38
0
03 Jul 2022
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement
  Learning
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Julien Perolat
Bart De Vylder
Daniel Hennes
Eugene Tarassov
Florian Strub
...
Rémi Munos
David Silver
Satinder Singh
Demis Hassabis
K. Tuyls
101
206
0
30 Jun 2022
The Phenomenon of Policy Churn
The Phenomenon of Policy Churn
Tom Schaul
André Barreto
John Quan
Georg Ostrovski
89
28
0
01 Jun 2022
The Primacy Bias in Deep Reinforcement Learning
The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin
Max Schwarzer
P. DÓro
Pierre-Luc Bacon
Rameswar Panda
OnRL
150
197
0
16 May 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
262
2,630
0
12 Apr 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
936
13,282
0
04 Mar 2022
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open
  Problems
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems
Rafael Figueiredo Prudencio
Marcos R. O. A. Máximo
Esther Luna Colombini
OffRL
111
244
0
02 Mar 2022
Evolving Curricula with Regret-Based Environment Design
Evolving Curricula with Regret-Based Environment Design
Jack Parker-Holder
Minqi Jiang
Michael Dennis
Mikayel Samvelyan
Jakob N. Foerster
Edward Grefenstette
Tim Rocktaschel
116
125
0
02 Mar 2022
Learning robust perceptive locomotion for quadrupedal robots in the wild
Learning robust perceptive locomotion for quadrupedal robots in the wild
Takahiro Miki
Joonho Lee
Jemin Hwangbo
Lorenz Wellhausen
V. Koltun
Marco Hutter
137
716
0
20 Jan 2022
GRI: General Reinforced Imitation and its Application to Vision-Based
  Autonomous Driving
GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving
Raphael Chekroun
Marin Toromanoff
Sascha Hornauer
Fabien Moutarde
92
61
0
16 Nov 2021
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Rishabh Agarwal
Max Schwarzer
Pablo Samuel Castro
Aaron Courville
Marc G. Bellemare
OffRL
190
680
0
30 Aug 2021
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
Zhejun Zhang
Alexander Liniger
Dengxin Dai
Feng Yu
Luc Van Gool
116
211
0
18 Aug 2021
Mastering Visual Continuous Control: Improved Data-Augmented
  Reinforcement Learning
Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning
Denis Yarats
Rob Fergus
A. Lazaric
Lerrel Pinto
OffRL
133
353
0
20 Jul 2021
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen
Kevin Lu
Aravind Rajeswaran
Kimin Lee
Aditya Grover
Michael Laskin
Pieter Abbeel
A. Srinivas
Igor Mordatch
OffRL
196
1,669
0
02 Jun 2021
MetricOpt: Learning to Optimize Black-Box Evaluation Metrics
MetricOpt: Learning to Optimize Black-Box Evaluation Metrics
Chen Huang
Shuangfei Zhai
Pengsheng Guo
J. Susskind
97
12
0
21 Apr 2021
Reinforcement Learning, Bit by Bit
Reinforcement Learning, Bit by Bit
Xiuyuan Lu
Benjamin Van Roy
Vikranth Dwaracherla
M. Ibrahimi
Ian Osband
Zheng Wen
126
70
0
06 Mar 2021
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
  Design
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Michael Dennis
Natasha Jaques
Eugene Vinitsky
Alexandre M. Bayen
Stuart J. Russell
Andrew Critch
Sergey Levine
114
237
0
03 Dec 2020
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Xinlei Chen
Kaiming He
SSL
381
4,083
0
20 Nov 2020
Learning Quadrupedal Locomotion over Challenging Terrain
Learning Quadrupedal Locomotion over Challenging Terrain
Joonho Lee
Jemin Hwangbo
Lorenz Wellhausen
V. Koltun
Marco Hutter
164
1,185
0
21 Oct 2020
Prioritized Level Replay
Prioritized Level Replay
Minqi Jiang
Edward Grefenstette
Tim Rocktaschel
OffRL
128
160
0
08 Oct 2020
Mastering Atari with Discrete World Models
Mastering Atari with Discrete World Models
Danijar Hafner
Timothy Lillicrap
Mohammad Norouzi
Jimmy Ba
DRL
192
876
0
05 Oct 2020
Phasic Policy Gradient
Phasic Policy Gradient
K. Cobbe
Jacob Hilton
Oleg Klimov
John Schulman
OffRL
78
160
0
09 Sep 2020
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
304
2,195
0
02 Sep 2020
Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with
  Asynchronous Reinforcement Learning
Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning
Aleksei Petrenko
Zhehui Huang
T. Kumar
Gaurav Sukhatme
V. Koltun
103
105
0
21 Jun 2020
What Matters In On-Policy Reinforcement Learning? A Large-Scale
  Empirical Study
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
Marcin Andrychowicz
Anton Raichuk
Piotr Stańczyk
Manu Orsini
Sertan Girgin
...
Matthieu Geist
Olivier Pietquin
Marcin Michalski
Sylvain Gelly
Olivier Bachem
OffRL
92
225
0
10 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.2K
42,712
0
28 May 2020
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and
  TRPO
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
Logan Engstrom
Andrew Ilyas
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
L. Rudolph
Aleksander Madry
AAML
87
229
0
25 May 2020
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on
  Open Problems
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRLGP
582
2,051
0
04 May 2020
Reinforcement Learning with Augmented Data
Reinforcement Learning with Augmented Data
Michael Laskin
Kimin Lee
Adam Stooke
Lerrel Pinto
Pieter Abbeel
A. Srinivas
OffRL
165
661
0
30 Apr 2020
Image Augmentation Is All You Need: Regularizing Deep Reinforcement
  Learning from Pixels
Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels
Ilya Kostrikov
Denis Yarats
Rob Fergus
OffRL
183
794
0
28 Apr 2020
First return, then explore
First return, then explore
Adrien Ecoffet
Joost Huizinga
Joel Lehman
Kenneth O. Stanley
Jeff Clune
108
365
0
27 Apr 2020
Shortcut Learning in Deep Neural Networks
Shortcut Learning in Deep Neural Networks
Robert Geirhos
J. Jacobsen
Claudio Michaelis
R. Zemel
Wieland Brendel
Matthias Bethge
Felix Wichmann
231
2,073
0
16 Apr 2020
Agent57: Outperforming the Atari Human Benchmark
Agent57: Outperforming the Atari Human Benchmark
Adria Puigdomenech Badia
Bilal Piot
Steven Kapturowski
Pablo Sprechmann
Alex Vitvitskyi
Daniel Guo
Charles Blundell
OffRL
109
521
0
30 Mar 2020
123
Next