ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.07608
  4. Cited By
Deep Exploration via Randomized Value Functions
v1v2v3v4v5 (latest)

Deep Exploration via Randomized Value Functions

22 March 2017
Ian Osband
Benjamin Van Roy
Daniel Russo
Zheng Wen
ArXiv (abs)PDFHTML

Papers citing "Deep Exploration via Randomized Value Functions"

46 / 46 papers shown
Title
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Moritz A. Zanger
Pascal R. van der Vaart
Wendelin Bohmer
M. Spaan
UQCVBDL
490
2
0
14 Mar 2025
Random Latent Exploration for Deep Reinforcement Learning
Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali
Zhang-Wei Hong
Ayush Sekhari
Alexander Rakhlin
Pulkit Agrawal
230
3
0
18 Jul 2024
Online Bandit Learning with Offline Preference Data for Improved RLHF
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
193
2
0
13 Jun 2024
Ensemble sampling for linear bandits: small ensembles suffice
Ensemble sampling for linear bandits: small ensembles suffice
David Janz
A. Litvak
Csaba Szepesvári
100
1
0
14 Nov 2023
Selective Uncertainty Propagation in Offline RL
Selective Uncertainty Propagation in Offline RL
Sanath Kumar Krishnamurthy
Shrey Modi
Tanmay Gangwani
S. Katariya
Branislav Kveton
A. Rangi
OffRL
191
0
0
01 Feb 2023
q-Learning in Continuous Time
q-Learning in Continuous Time
Yanwei Jia
X. Zhou
OffRL
104
77
0
02 Jul 2022
Overcoming Model Bias for Robust Offline Deep Reinforcement Learning
Overcoming Model Bias for Robust Offline Deep Reinforcement Learning
Phillip Swazinna
Steffen Udluft
Thomas Runkler
OffRL
50
84
0
12 Aug 2020
Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Daniel Russo
OffRL
48
88
0
07 Jun 2019
Randomized Prior Functions for Deep Reinforcement Learning
Randomized Prior Functions for Deep Reinforcement Learning
Ian Osband
John Aslanides
Albin Cassirer
UQCVBDL
76
380
0
08 Jun 2018
Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
Zhang-Wei Hong
Tzu-Yun Shann
Shih-Yang Su
Yi-Hsiang Chang
Chun-Yi Lee
69
124
0
13 Feb 2018
Efficient Exploration through Bayesian Deep Q-Networks
Efficient Exploration through Bayesian Deep Q-Networks
Kamyar Azizzadenesheli
Anima Anandkumar
OffRLBDL
79
163
0
13 Feb 2018
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
Zachary Chase Lipton
Xiujun Li
Jianfeng Gao
Lihong Li
Faisal Ahmed
Li Deng
78
172
0
15 Nov 2017
The Uncertainty Bellman Equation and Exploration
The Uncertainty Bellman Equation and Exploration
Brendan O'Donoghue
Ian Osband
Rémi Munos
Volodymyr Mnih
70
192
0
15 Sep 2017
A Distributional Perspective on Reinforcement Learning
A Distributional Perspective on Reinforcement Learning
Marc G. Bellemare
Will Dabney
Rémi Munos
OffRL
101
1,506
0
21 Jul 2017
Noisy Networks for Exploration
Noisy Networks for Exploration
Meire Fortunato
M. G. Azar
Bilal Piot
Jacob Menick
Ian Osband
...
Rémi Munos
Demis Hassabis
Olivier Pietquin
Charles Blundell
Shane Legg
79
897
0
30 Jun 2017
Spectrally-normalized margin bounds for neural networks
Spectrally-normalized margin bounds for neural networks
Peter L. Bartlett
Dylan J. Foster
Matus Telgarsky
ODL
210
1,225
0
26 Jun 2017
Ensemble Sampling
Ensemble Sampling
Xiuyuan Lu
Benjamin Van Roy
129
121
0
20 May 2017
Minimax Regret Bounds for Reinforcement Learning
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
92
778
0
16 Mar 2017
Gaussian-Dirichlet Posterior Dominance in Sequential Learning
Gaussian-Dirichlet Posterior Dominance in Sequential Learning
Ian Osband
Benjamin Van Roy
41
9
0
14 Feb 2017
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement
  Learning
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Haoran Tang
Rein Houthooft
Davis Foote
Adam Stooke
Xi Chen
Yan Duan
John Schulman
F. Turck
Pieter Abbeel
OffRL
106
775
0
15 Nov 2016
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
348
4,636
0
10 Nov 2016
On Lower Bounds for Regret in Reinforcement Learning
On Lower Bounds for Regret in Reinforcement Learning
Ian Osband
Benjamin Van Roy
83
101
0
09 Aug 2016
Why is Posterior Sampling Better than Optimism for Reinforcement
  Learning?
Why is Posterior Sampling Better than Optimism for Reinforcement Learning?
Ian Osband
Benjamin Van Roy
BDL
85
261
0
01 Jul 2016
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
138
617
0
08 Jun 2016
Unifying Count-Based Exploration and Intrinsic Motivation
Unifying Count-Based Exploration and Intrinsic Motivation
Marc G. Bellemare
S. Srinivasan
Georg Ostrovski
Tom Schaul
D. Saxton
Rémi Munos
179
1,483
0
06 Jun 2016
Deep Exploration via Bootstrapped DQN
Deep Exploration via Bootstrapped DQN
Ian Osband
Charles Blundell
Alexander Pritzel
Benjamin Van Roy
121
1,313
0
15 Feb 2016
Angrier Birds: Bayesian reinforcement learning
Angrier Birds: Bayesian reinforcement learning
Imanol Arrieta-Ibarra
Bernardo Ramos
Lars Roemheld
33
1
0
06 Jan 2016
State of the Art Control of Atari Games Using Shallow Reinforcement
  Learning
State of the Art Control of Atari Games Using Shallow Reinforcement Learning
Yitao Liang
Marlos C. Machado
Erik Talvitie
Michael Bowling
72
113
0
04 Dec 2015
Prioritized Experience Replay
Prioritized Experience Replay
Tom Schaul
John Quan
Ioannis Antonoglou
David Silver
OffRL
223
3,797
0
18 Nov 2015
How much does your data exploration overfit? Controlling bias via
  information usage
How much does your data exploration overfit? Controlling bias via information usage
D. Russo
James Zou
52
192
0
16 Nov 2015
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Christoph Dann
Emma Brunskill
74
249
0
29 Oct 2015
Bootstrapped Thompson Sampling and Deep Exploration
Bootstrapped Thompson Sampling and Deep Exploration
Ian Osband
Benjamin Van Roy
162
105
0
01 Jul 2015
Model-based Reinforcement Learning and the Eluder Dimension
Model-based Reinforcement Learning and the Eluder Dimension
Ian Osband
Benjamin Van Roy
92
190
0
07 Jun 2014
Learning to Optimize via Information-Directed Sampling
Learning to Optimize via Information-Directed Sampling
Daniel Russo
Benjamin Van Roy
163
284
0
21 Mar 2014
Near-optimal Reinforcement Learning in Factored MDPs
Near-optimal Reinforcement Learning in Factored MDPs
Ian Osband
Benjamin Van Roy
99
124
0
15 Mar 2014
Generalization and Exploration via Randomized Value Functions
Generalization and Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Zheng Wen
91
314
0
04 Feb 2014
The Sample-Complexity of General Reinforcement Learning
The Sample-Complexity of General Reinforcement Learning
Tor Lattimore
Marcus Hutter
P. Sunehag
VLM
80
67
0
22 Aug 2013
(More) Efficient Reinforcement Learning via Posterior Sampling
(More) Efficient Reinforcement Learning via Posterior Sampling
Ian Osband
Daniel Russo
Benjamin Van Roy
131
535
0
04 Jun 2013
Regret Bounds for Reinforcement Learning with Policy Advice
Regret Bounds for Reinforcement Learning with Policy Advice
M. G. Azar
A. Lazaric
Emma Brunskill
94
36
0
05 May 2013
Efficient Reinforcement Learning for High Dimensional Linear Quadratic
  Systems
Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems
M. Ibrahimi
Adel Javanmard
Benjamin Van Roy
89
91
0
24 Mar 2013
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
R. Ortner
D. Ryabko
OffRL
102
85
0
11 Feb 2013
Learning to Optimize Via Posterior Sampling
Learning to Optimize Via Posterior Sampling
Daniel Russo
Benjamin Van Roy
203
703
0
11 Jan 2013
Further Optimal Regret Bounds for Thompson Sampling
Further Optimal Regret Bounds for Thompson Sampling
Shipra Agrawal
Navin Goyal
110
442
0
15 Sep 2012
Thompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling for Contextual Bandits with Linear Payoffs
Shipra Agrawal
Navin Goyal
204
1,006
0
15 Sep 2012
REGAL: A Regularization based Algorithm for Reinforcement Learning in
  Weakly Communicating MDPs
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs
Peter L. Bartlett
Ambuj Tewari
93
286
0
09 May 2012
Bootstrapping data arrays of arbitrary order
Bootstrapping data arrays of arbitrary order
Art B. Owen
Dean Eckles
87
39
0
10 Jun 2011
1