Provably Correct Optimization and Exploration with Non-linear Policies

Provably Correct Optimization and Exploration with Non-linear Policies

22 March 2021

ArXiv (abs)PDF HTML

Papers citing "Provably Correct Optimization and Exploration with Non-linear Policies"

12 / 12 papers shown

Title
The Central Role of the Loss Function in Reinforcement Learning Kaiwen Wang Nathan Kallus Wen Sun OffRL 268 10 0 19 Sep 2024
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs Alekh Agarwal Sham Kakade A. Krishnamurthy Wen Sun OffRL 173 227 0 18 Jun 2020
Optimistic Policy Optimization with Bandit Feedback Yonathan Efroni Lior Shani Aviv A. Rosenberg Shie Mannor 63 90 0 19 Feb 2020
Provably Efficient Exploration in Policy Optimization Qi Cai Zhuoran Yang Chi Jin Zhaoran Wang 85 283 0 12 Dec 2019
Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning Dipendra Kumar Misra Mikael Henaff A. Krishnamurthy John Langford 83 151 0 13 Nov 2019
Provably Efficient Reinforcement Learning with Linear Function Approximation Chi Jin Zhuoran Yang Zhaoran Wang Michael I. Jordan 109 560 0 11 Jul 2019
Global Optimality Guarantees For Policy Gradient Methods Jalaj Bhandari Daniel Russo 93 193 0 05 Jun 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound Lin F. Yang Mengdi Wang OffRL GP 79 288 0 24 May 2019
A Theory of Regularized Markov Decision Processes Matthieu Geist B. Scherrer Olivier Pietquin 137 333 0 31 Jan 2019
Practical Contextual Bandits with Regression Oracles Dylan J. Foster Alekh Agarwal Miroslav Dudík Haipeng Luo Robert Schapire 395 127 0 03 Mar 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Tuomas Haarnoja Aurick Zhou Pieter Abbeel Sergey Levine 317 8,420 0 04 Jan 2018
Asynchronous Methods for Deep Reinforcement Learning Volodymyr Mnih Adria Puigdomenech Badia M. Berk Mirza Alex Graves Timothy Lillicrap Tim Harley David Silver Koray Kavukcuoglu 210 8,881 0 04 Feb 2016