Policy Mirror Descent with Lookahead

21 March 2024

Papers citing "Policy Mirror Descent with Lookahead"

10 / 10 papers shown

Title
Ordering-based Conditions for Global Convergence of Policy Gradient Methods Jincheng Mei Bo Dai Alekh Agarwal Mohammad Ghavamzadeh Csaba Szepesvári Dale Schuurmans 122 4 0 02 Apr 2025
Functional Acceleration for Policy Mirror Descent Veronica Chelu Doina Precup 62 0 0 23 Jul 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 806 12,893 0 04 Mar 2022
Planning and Learning with Adaptive Lookahead Aviv A. Rosenberg Assaf Hallak Shie Mannor Gal Chechik Gal Dalal 58 8 0 28 Jan 2022
The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation Anna Winnicki Joseph Lubars Michael Livesay R. Srikant 45 3 0 28 Sep 2021
Monte-Carlo Tree Search as Regularized Policy Optimization Jean-Bastien Grill Florent Altché Yunhao Tang Thomas Hubert Michal Valko Ioannis Antonoglou Rémi Munos 74 75 0 24 Jul 2020
Global Optimality Guarantees For Policy Gradient Methods Jalaj Bhandari Daniel Russo 72 193 0 05 Jun 2019
How to Combine Tree-Search Methods in Reinforcement Learning Yonathan Efroni Gal Dalal B. Scherrer Shie Mannor 51 31 0 06 Sep 2018
Beyond the One Step Greedy Approach in Reinforcement Learning Yonathan Efroni Gal Dalal B. Scherrer Shie Mannor OffRL 80 50 0 10 Feb 2018
Improved and Generalized Upper Bounds on the Complexity of Policy Iteration B. Scherrer 80 76 0 03 Jun 2013