Boltzmann Exploration Done Right

Boltzmann Exploration Done Right

29 May 2017

Nicolò Cesa-Bianchi

Claudio Gentile

Gábor Lugosi

Papers citing "Boltzmann Exploration Done Right"

14 / 14 papers shown

Title
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Shicong Cen Jincheng Mei Katayoon Goshvadi Hanjun Dai Tong Yang Sherry Yang Dale Schuurmans Yuejie Chi Bo Dai OffRL 109 35 0 20 Feb 2025
Divergence-Augmented Policy Optimization Qing Wang Yingru Li Jiechao Xiong Tong Zhang OffRL 141 16 0 28 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems Hashmath Shaik Alex Doboli OffRL ELM 371 0 0 31 Dec 2024
Random Latent Exploration for Deep Reinforcement Learning Srinath Mahankali Zhang-Wei Hong Ayush Sekhari Alexander Rakhlin Pulkit Agrawal 188 3 0 18 Jul 2024
On Explore-Then-Commit Strategies Aurélien Garivier E. Kaufmann Tor Lattimore 72 107 0 29 May 2016
Regret Analysis of the Anytime Optimally Confident UCB Algorithm Tor Lattimore 49 26 0 29 Mar 2016
Optimally Confident UCB: Improved Regret for Finite-Armed Bandits Tor Lattimore 63 46 0 28 Jul 2015
Online Linear Optimization via Smoothing Jacob D. Abernethy Chansoo Lee Abhinav Sinha Ambuj Tewari 110 77 0 23 May 2014
Algorithms for multi-armed bandit problems Volodymyr Kuleshov Doina Precup 125 350 0 25 Feb 2014
Generalization and Exploration via Randomized Value Functions Ian Osband Benjamin Van Roy Zheng Wen 77 314 0 04 Feb 2014
Kullback-Leibler upper confidence bounds for optimal sequential allocation Olivier Cappé Aurélien Garivier Odalric-Ambrym Maillard Rémi Munos Gilles Stoltz 118 395 0 03 Oct 2012
Further Optimal Regret Bounds for Thompson Sampling Shipra Agrawal Navin Goyal 100 442 0 15 Sep 2012
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis E. Kaufmann N. Korda Rémi Munos 155 588 0 18 May 2012
Challenging the empirical mean and empirical variance: a deviation study O. Catoni 159 462 0 10 Sep 2010