Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

31 March 2015

Papers citing "Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning"

4 / 4 papers shown

Title
Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize Huizhen Yu 60 30 0 23 Nov 2015
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models E. Kaufmann Olivier Cappé Aurélien Garivier 193 1,025 0 16 Jul 2014
Off-Policy Actor-Critic T. Degris Martha White R. Sutton OffRL CML 230 220 0 22 May 2012
Convergence and Convergence Rate of Stochastic Gradient Search in the Case of Multiple and Non-Isolated Extrema V. Tadic 102 19 0 06 Jul 2009