ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.05187
  4. Cited By
Learning mirror maps in policy mirror descent
v1v2 (latest)

Learning mirror maps in policy mirror descent

7 February 2024
Carlo Alfano
Sebastian Towers
Silvia Sapora
Chris Xiaoxuan Lu
Patrick Rebeschini
ArXiv (abs)PDFHTML

Papers citing "Learning mirror maps in policy mirror descent"

17 / 17 papers shown
Title
Discovering Temporally-Aware Reinforcement Learning Algorithms
Discovering Temporally-Aware Reinforcement Learning Algorithms
Matthew Jackson
Chris Xiaoxuan Lu
Louis Kirsch
R. T. Lange
Shimon Whiteson
Jakob N. Foerster
108
18
0
08 Feb 2024
Decision-Aware Actor-Critic with Function Approximation and Theoretical
  Guarantees
Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees
Sharan Vaswani
A. Kazemi
Reza Babanezhad
Nicolas Le Roux
OffRL
74
4
0
24 May 2023
Policy Mirror Descent Inherently Explores Action Space
Policy Mirror Descent Inherently Explores Action Space
Yan Li
Guanghui Lan
OffRL
113
8
0
08 Mar 2023
Adversarial Cheap Talk
Adversarial Cheap Talk
Chris Xiaoxuan Lu
Timon Willi
Alistair Letcher
Jakob N. Foerster
AAML
79
16
0
20 Nov 2022
Discovered Policy Optimisation
Discovered Policy Optimisation
Chris Xiaoxuan Lu
J. Kuba
Alistair Letcher
Luke Metz
Christian Schroeder de Witt
Jakob N. Foerster
OffRL
79
79
0
11 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
891
13,228
0
04 Mar 2022
Dota 2 with Large Scale Deep Reinforcement Learning
Dota 2 with Large Scale Deep Reinforcement Learning
OpenAI OpenAI
:
Christopher Berner
Greg Brockman
Brooke Chan
...
Szymon Sidor
Ilya Sutskever
Jie Tang
Filip Wolski
Susan Zhang
GNNVLMCLLAI4CELRM
169
1,838
0
13 Dec 2019
Kinematic State Abstraction and Provably Efficient Rich-Observation
  Reinforcement Learning
Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning
Dipendra Kumar Misra
Mikael Henaff
A. Krishnamurthy
John Langford
85
151
0
13 Nov 2019
Optuna: A Next-generation Hyperparameter Optimization Framework
Optuna: A Next-generation Hyperparameter Optimization Framework
Takuya Akiba
Shotaro Sano
Toshihiko Yanase
Takeru Ohta
Masanori Koyama
679
5,872
0
25 Jul 2019
Regularized Evolution for Image Classifier Architecture Search
Regularized Evolution for Image Classifier Architecture Search
Esteban Real
A. Aggarwal
Yanping Huang
Quoc V. Le
185
3,039
0
05 Feb 2018
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative
  for Training Deep Neural Networks for Reinforcement Learning
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
F. Such
Vashisht Madhavan
Edoardo Conti
Joel Lehman
Kenneth O. Stanley
Jeff Clune
118
695
0
18 Dec 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
577
19,315
0
20 Jul 2017
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Tim Salimans
Jonathan Ho
Xi Chen
Szymon Sidor
Ilya Sutskever
122
1,544
0
10 Mar 2017
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving
Shai Shalev-Shwartz
Shaked Shammah
Amnon Shashua
120
840
0
11 Oct 2016
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
210
8,882
0
04 Feb 2016
High-Dimensional Continuous Control Using Generalized Advantage
  Estimation
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
135
3,442
0
08 Jun 2015
Infinite-Horizon Policy-Gradient Estimation
Infinite-Horizon Policy-Gradient Estimation
Jonathan Baxter
Peter L. Bartlett
111
812
0
03 Jun 2011
1