ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.04168
  4. Cited By
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
  Optimism, Embrace Virtual Curvature
v1v2v3v4v5 (latest)

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

8 February 2021
Kefan Dong
Jiaqi Yang
Tengyu Ma
ArXiv (abs)PDFHTML

Papers citing "Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature"

38 / 38 papers shown
Title
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Subhojyoti Mukherjee
Josiah P. Hanna
Qiaomin Xie
Robert Nowak
214
2
0
07 Jun 2024
Bilinear Classes: A Structural Framework for Provable Generalization in
  RL
Bilinear Classes: A Structural Framework for Provable Generalization in RL
S. Du
Sham Kakade
Jason D. Lee
Shachar Lovett
G. Mahajan
Wen Sun
Ruosong Wang
OffRL
171
191
0
19 Mar 2021
Adapting to Misspecification in Contextual Bandits with Offline
  Regression Oracles
Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles
Sanath Kumar Krishnamurthy
Vitor Hadad
Susan Athey
OffRL
124
23
0
26 Feb 2021
On Function Approximation in Reinforcement Learning: Optimism in the
  Face of Large State Spaces
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Zhuoran Yang
Chi Jin
Zhaoran Wang
Mengdi Wang
Michael I. Jordan
82
18
0
09 Nov 2020
Efficient Planning in Large MDPs with Weak Linear Function Approximation
Efficient Planning in Large MDPs with Weak Linear Function Approximation
R. Shariff
Csaba Szepesvári
68
22
0
13 Jul 2020
Information Theoretic Regret Bounds for Online Nonlinear Control
Information Theoretic Regret Bounds for Online Nonlinear Control
Sham Kakade
A. Krishnamurthy
Kendall Lowrey
Motoya Ohnishi
Wen Sun
67
119
0
22 Jun 2020
FLAMBE: Structural Complexity and Representation Learning of Low Rank
  MDPs
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
Alekh Agarwal
Sham Kakade
A. Krishnamurthy
Wen Sun
OffRL
170
227
0
18 Jun 2020
A Primer on Zeroth-Order Optimization in Signal Processing and Machine
  Learning
A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning
Sijia Liu
Pin-Yu Chen
B. Kailkhura
Gaoyuan Zhang
A. Hero III
P. Varshney
72
235
0
11 Jun 2020
Model-Based Reinforcement Learning with Value-Targeted Regression
Model-Based Reinforcement Learning with Value-Targeted Regression
Alex Ayoub
Zeyu Jia
Csaba Szepesvári
Mengdi Wang
Lin F. Yang
OffRL
94
305
0
01 Jun 2020
Reinforcement Learning with General Value Function Approximation:
  Provably Efficient Approach via Bounded Eluder Dimension
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension
Ruosong Wang
Ruslan Salakhutdinov
Lin F. Yang
65
55
0
21 May 2020
Model-Augmented Actor-Critic: Backpropagating through Paths
Model-Augmented Actor-Critic: Backpropagating through Paths
I. Clavera
Yao Fu
Pieter Abbeel
77
88
0
16 May 2020
Bypassing the Monster: A Faster and Simpler Optimal Algorithm for
  Contextual Bandits under Realizability
Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability
D. Simchi-Levi
Yunzong Xu
OffRL
379
111
0
28 Mar 2020
Improved Optimistic Algorithms for Logistic Bandits
Improved Optimistic Algorithms for Logistic Bandits
Louis Faury
Marc Abeille
Clément Calauzènes
Olivier Fercoq
87
95
0
18 Feb 2020
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
  Oracles
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
Dylan J. Foster
Alexander Rakhlin
369
212
0
12 Feb 2020
Dota 2 with Large Scale Deep Reinforcement Learning
Dota 2 with Large Scale Deep Reinforcement Learning
OpenAI OpenAI
:
Christopher Berner
Greg Brockman
Brooke Chan
...
Szymon Sidor
Ilya Sutskever
Jie Tang
Filip Wolski
Susan Zhang
GNNVLMCLLAI4CELRM
169
1,836
0
13 Dec 2019
Provably Efficient Exploration in Policy Optimization
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
66
283
0
12 Dec 2019
Optimism in Reinforcement Learning with Generalized Linear Function
  Approximation
Optimism in Reinforcement Learning with Generalized Linear Function Approximation
Yining Wang
Ruosong Wang
S. Du
A. Krishnamurthy
182
137
0
09 Dec 2019
Dream to Control: Learning Behaviors by Latent Imagination
Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner
Timothy Lillicrap
Jimmy Ba
Mohammad Norouzi
VLM
126
1,371
0
03 Dec 2019
Kinematic State Abstraction and Provably Efficient Rich-Observation
  Reinforcement Learning
Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning
Dipendra Kumar Misra
Mikael Henaff
A. Krishnamurthy
John Langford
79
151
0
13 Nov 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
69
321
0
01 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function
  Approximation
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
98
560
0
11 Jul 2019
When to Trust Your Model: Model-Based Policy Optimization
When to Trust Your Model: Model-Based Policy Optimization
Michael Janner
Justin Fu
Marvin Zhang
Sergey Levine
OffRL
102
956
0
19 Jun 2019
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and
  Regret Bound
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Lin F. Yang
Mengdi Wang
OffRLGP
64
288
0
24 May 2019
Provably efficient RL with Rich Observations via Latent State Decoding
Provably efficient RL with Rich Observations via Latent State Decoding
S. Du
A. Krishnamurthy
Nan Jiang
Alekh Agarwal
Miroslav Dudík
John Langford
OffRL
74
230
0
25 Jan 2019
Learning Latent Dynamics for Planning from Pixels
Learning Latent Dynamics for Planning from Pixels
Danijar Hafner
Timothy Lillicrap
Ian S. Fischer
Ruben Villegas
David R Ha
Honglak Lee
James Davidson
BDL
88
1,446
0
12 Nov 2018
On Oracle-Efficient PAC RL with Rich Observations
On Oracle-Efficient PAC RL with Rich Observations
Christoph Dann
Nan Jiang
A. Krishnamurthy
Alekh Agarwal
John Langford
Robert Schapire
49
98
0
01 Mar 2018
Applications of Deep Learning and Reinforcement Learning to Biological
  Data
Applications of Deep Learning and Reinforcement Learning to Biological Data
M. S. M. Mahmud
M. S. Kaiser
Amir Hussain
S. Vassanelli
OffRLAI4CE
67
645
0
10 Nov 2017
No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified
  Geometric Analysis
No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis
Rong Ge
Chi Jin
Yi Zheng
132
436
0
03 Apr 2017
Input Convex Neural Networks
Input Convex Neural Networks
Brandon Amos
Lei Xu
J. Zico Kolter
280
624
0
22 Sep 2016
Matrix Completion has No Spurious Local Minimum
Matrix Completion has No Spurious Local Minimum
Rong Ge
Jason D. Lee
Tengyu Ma
114
599
0
24 May 2016
Empirical Evaluation of Rectified Activations in Convolutional Network
Empirical Evaluation of Rectified Activations in Convolutional Network
Bing Xu
Naiyan Wang
Tianqi Chen
Mu Li
140
2,913
0
05 May 2015
Escaping From Saddle Points --- Online Stochastic Gradient for Tensor
  Decomposition
Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition
Rong Ge
Furong Huang
Chi Jin
Yang Yuan
143
1,059
0
06 Mar 2015
Optimal rates for zero-order convex optimization: the power of two
  function evaluations
Optimal rates for zero-order convex optimization: the power of two function evaluations
John C. Duchi
Michael I. Jordan
Martin J. Wainwright
Andre Wibisono
82
489
0
07 Dec 2013
Finite-Time Analysis of Kernelised Contextual Bandits
Finite-Time Analysis of Kernelised Contextual Bandits
Michal Valko
N. Korda
Rémi Munos
I. Flaounas
N. Cristianini
185
275
0
26 Sep 2013
Bandit Theory meets Compressed Sensing for high dimensional Stochastic
  Linear Bandit
Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit
Alexandra Carpentier
Rémi Munos
82
102
0
18 May 2012
A tail inequality for quadratic forms of subgaussian random vectors
A tail inequality for quadratic forms of subgaussian random vectors
Daniel J. Hsu
Sham Kakade
Tong Zhang
139
422
0
13 Oct 2011
Efficient Learning of Generalized Linear and Single Index Models with
  Isotonic Regression
Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
Sham Kakade
Adam Tauman Kalai
Varun Kanade
Ohad Shamir
196
180
0
11 Apr 2011
Online Learning via Sequential Complexities
Online Learning via Sequential Complexities
Alexander Rakhlin
Karthik Sridharan
Ambuj Tewari
117
104
0
06 Jun 2010
1