Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.13773
Cited By
Beyond variance reduction: Understanding the true impact of baselines on policy optimization
31 August 2020
Wesley Chung
Valentin Thomas
Marlos C. Machado
Nicolas Le Roux
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Beyond variance reduction: Understanding the true impact of baselines on policy optimization"
7 / 7 papers shown
Title
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
89
2
0
18 Mar 2025
Behind the Myth of Exploration in Policy Gradients
Adrien Bolland
Gaspard Lambrechts
Damien Ernst
53
0
0
31 Jan 2024
Target-independent XLA optimization using Reinforcement Learning
Milan Ganai
Haichen Li
Theodore Enns
Yida Wang
Randy Huang
39
0
0
28 Aug 2023
The Role of Baselines in Policy Gradient Optimization
Jincheng Mei
Wesley Chung
Valentin Thomas
Bo Dai
Csaba Szepesvári
Dale Schuurmans
29
15
0
16 Jan 2023
When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development
Nghia Duong-Trung
Stefan Born
Jong Woo Kim
M. Schermeyer
Katharina Paulick
...
Thorben Werner
Randolf Scholz
Lars Schmidt-Thieme
Peter Neubauer
Ernesto Martinez
34
20
0
02 Sep 2022
PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation
Matilde Gargiani
Andrea Zanelli
Andrea Martinelli
Tyler H. Summers
John Lygeros
33
14
0
01 Feb 2022
Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits
Kaushik Roy
Qi Zhang
Manas Gaur
A. Sheth
OffRL
28
15
0
25 Jun 2021
1