Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1512.08562
Cited By
v1
v2
v3
v4 (latest)
Taming the Noise in Reinforcement Learning via Soft Updates
28 December 2015
Roy Fox
Ari Pakman
Naftali Tishby
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Taming the Noise in Reinforcement Learning via Soft Updates"
14 / 14 papers shown
Title
Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo
Matthieu Meunier
C. Reisinger
Yufei Zhang
86
0
0
27 Mar 2025
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Yongtao Wu
Luca Viano
Yihang Chen
Zhenyu Zhu
Kimon Antonakopoulos
Quanquan Gu
Volkan Cevher
133
1
0
18 Feb 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
155
16
0
28 Jan 2025
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Yannis Flet-Berliac
Nathan Grinsztajn
Florian Strub
Bill Wu
Eugene Choi
...
Arash Ahmadian
Yash Chandak
M. G. Azar
Olivier Pietquin
Matthieu Geist
OffRL
135
9
0
17 Jan 2025
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing
Vernon Luk
Jean Oh
147
0
0
16 Dec 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
216
6
0
07 Nov 2024
The Evolution of Reinforcement Learning in Quantitative Finance: A Survey
Nikolaos Pippas
Cagatay Turkay
Elliot A. Ludvig
AIFin
164
3
0
20 Aug 2024
Value Improved Actor Critic Algorithms
Yaniv Oren
Moritz A. Zanger
Pascal R. van der Vaart
M. Spaan
Wendelin Bohmer
Wendelin Bohmer
OffRL
77
0
0
03 Jun 2024
Imitation-regularized Optimal Transport on Networks: Provable Robustness and Application to Logistics Planning
Koshi Oishi
Yota Hashizume
Tomohiko Jimbo
Hirotaka Kaji
Kenji Kashima
OOD
76
2
0
28 Feb 2024
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
158
472
0
28 Feb 2017
Increasing the Action Gap: New Operators for Reinforcement Learning
Marc G. Bellemare
Georg Ostrovski
A. Guez
Philip S. Thomas
Rémi Munos
71
157
0
15 Dec 2015
Deep Reinforcement Learning with Double Q-learning
H. V. Hasselt
A. Guez
David Silver
OffRL
170
7,641
0
22 Sep 2015
Approximate Inference and Stochastic Optimal Control
K. Rawlik
Marc Toussaint
S. Vijayakumar
73
21
0
20 Sep 2010
Dynamic Policy Programming
M. G. Azar
Vicencc Gómez
H. Kappen
111
123
0
12 Apr 2010
1