Beyond variance reduction: Understanding the true impact of baselines on
policy optimization

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

31 August 2020

Valentin Thomas

Marlos C. Machado

Nicolas Le Roux

Papers citing "Beyond variance reduction: Understanding the true impact of baselines on policy optimization"

7 / 7 papers shown

Title
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs Nicolas Le Roux Marc G. Bellemare Jonathan Lebensold Arnaud Bergeron Joshua Greaves Alex Fréchette Carolyne Pelletier Eric Thibodeau-Laufer Sándor Toth Sam Work OffRL 89 2 0 18 Mar 2025
Behind the Myth of Exploration in Policy Gradients Adrien Bolland Gaspard Lambrechts Damien Ernst 53 0 0 31 Jan 2024
Target-independent XLA optimization using Reinforcement Learning Milan Ganai Haichen Li Theodore Enns Yida Wang Randy Huang 39 0 0 28 Aug 2023
The Role of Baselines in Policy Gradient Optimization Jincheng Mei Wesley Chung Valentin Thomas Bo Dai Csaba Szepesvári Dale Schuurmans 29 15 0 16 Jan 2023
When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development Nghia Duong-Trung Stefan Born Jong Woo Kim M. Schermeyer Katharina Paulick ... Thorben Werner Randolf Scholz Lars Schmidt-Thieme Peter Neubauer Ernesto Martinez 34 20 0 02 Sep 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation Matilde Gargiani Andrea Zanelli Andrea Martinelli Tyler H. Summers John Lygeros 33 14 0 01 Feb 2022
Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits Kaushik Roy Qi Zhang Manas Gaur A. Sheth OffRL 28 15 0 25 Jun 2021