ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.01552
  4. Cited By
Exploration versus exploitation in reinforcement learning: a stochastic
  control approach

Exploration versus exploitation in reinforcement learning: a stochastic control approach

4 December 2018
Haoran Wang
T. Zariphopoulou
X. Zhou
ArXivPDFHTML

Papers citing "Exploration versus exploitation in reinforcement learning: a stochastic control approach"

14 / 14 papers shown
Title
A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network
A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network
Julian Carvajal Rico
A. Alaeddini
Syed Hasib Akhter Faruqui
S. Fisher-Hoch
J. McCormick
118
0
0
13 Mar 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
293
0
0
31 Dec 2024
Gradient Flows for Regularized Stochastic Control Problems
Gradient Flows for Regularized Stochastic Control Problems
David Siska
Lukasz Szpruch
38
21
0
10 Jun 2020
A Tour of Reinforcement Learning: The View from Continuous Control
A Tour of Reinforcement Learning: The View from Continuous Control
Benjamin Recht
37
623
0
25 Jun 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
159
8,236
0
04 Jan 2018
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
38
106
0
06 Jul 2017
Parameter Space Noise for Exploration
Parameter Space Noise for Exploration
Matthias Plappert
Rein Houthooft
Prafulla Dhariwal
Szymon Sidor
Richard Y. Chen
Xi Chen
Tamim Asfour
Pieter Abbeel
Marcin Andrychowicz
40
594
0
06 Jun 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
76
469
0
28 Feb 2017
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based Policies
Tuomas Haarnoja
Haoran Tang
Pieter Abbeel
Sergey Levine
42
1,329
0
27 Feb 2017
Learning to Navigate in Complex Environments
Learning to Navigate in Complex Environments
Piotr Wojciech Mirowski
Razvan Pascanu
Fabio Viola
Hubert Soyer
Andy Ballard
...
Ross Goroshin
Laurent Sifre
Koray Kavukcuoglu
D. Kumaran
R. Hadsell
52
876
0
11 Nov 2016
Taming the Noise in Reinforcement Learning via Soft Updates
Taming the Noise in Reinforcement Learning via Soft Updates
Roy Fox
Ari Pakman
Naftali Tishby
26
336
0
28 Dec 2015
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
127
13,174
0
09 Sep 2015
End-to-End Training of Deep Visuomotor Policies
End-to-End Training of Deep Visuomotor Policies
Sergey Levine
Chelsea Finn
Trevor Darrell
Pieter Abbeel
BDL
163
3,418
0
02 Apr 2015
Learning to Optimize Via Posterior Sampling
Learning to Optimize Via Posterior Sampling
Daniel Russo
Benjamin Van Roy
89
697
0
11 Jan 2013
1