Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.10257
Cited By
Boltzmann Exploration Done Right
29 May 2017
Nicolò Cesa-Bianchi
Claudio Gentile
Gábor Lugosi
Gergely Neu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Boltzmann Exploration Done Right"
14 / 14 papers shown
Title
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
109
35
0
20 Feb 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
141
16
0
28 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
371
0
0
31 Dec 2024
Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali
Zhang-Wei Hong
Ayush Sekhari
Alexander Rakhlin
Pulkit Agrawal
188
3
0
18 Jul 2024
On Explore-Then-Commit Strategies
Aurélien Garivier
E. Kaufmann
Tor Lattimore
72
107
0
29 May 2016
Regret Analysis of the Anytime Optimally Confident UCB Algorithm
Tor Lattimore
49
26
0
29 Mar 2016
Optimally Confident UCB: Improved Regret for Finite-Armed Bandits
Tor Lattimore
63
46
0
28 Jul 2015
Online Linear Optimization via Smoothing
Jacob D. Abernethy
Chansoo Lee
Abhinav Sinha
Ambuj Tewari
110
77
0
23 May 2014
Algorithms for multi-armed bandit problems
Volodymyr Kuleshov
Doina Precup
125
350
0
25 Feb 2014
Generalization and Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Zheng Wen
77
314
0
04 Feb 2014
Kullback-Leibler upper confidence bounds for optimal sequential allocation
Olivier Cappé
Aurélien Garivier
Odalric-Ambrym Maillard
Rémi Munos
Gilles Stoltz
118
395
0
03 Oct 2012
Further Optimal Regret Bounds for Thompson Sampling
Shipra Agrawal
Navin Goyal
100
442
0
15 Sep 2012
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
E. Kaufmann
N. Korda
Rémi Munos
155
588
0
18 May 2012
Challenging the empirical mean and empirical variance: a deviation study
O. Catoni
159
462
0
10 Sep 2010
1