ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.10257
  4. Cited By
Boltzmann Exploration Done Right

Boltzmann Exploration Done Right

29 May 2017
Nicolò Cesa-Bianchi
Claudio Gentile
Gábor Lugosi
Gergely Neu
ArXivPDFHTML

Papers citing "Boltzmann Exploration Done Right"

14 / 14 papers shown
Title
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
109
35
0
20 Feb 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
143
16
0
28 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
371
0
0
31 Dec 2024
Random Latent Exploration for Deep Reinforcement Learning
Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali
Zhang-Wei Hong
Ayush Sekhari
Alexander Rakhlin
Pulkit Agrawal
199
3
0
18 Jul 2024
On Explore-Then-Commit Strategies
On Explore-Then-Commit Strategies
Aurélien Garivier
E. Kaufmann
Tor Lattimore
72
107
0
29 May 2016
Regret Analysis of the Anytime Optimally Confident UCB Algorithm
Regret Analysis of the Anytime Optimally Confident UCB Algorithm
Tor Lattimore
49
26
0
29 Mar 2016
Optimally Confident UCB: Improved Regret for Finite-Armed Bandits
Optimally Confident UCB: Improved Regret for Finite-Armed Bandits
Tor Lattimore
63
46
0
28 Jul 2015
Online Linear Optimization via Smoothing
Online Linear Optimization via Smoothing
Jacob D. Abernethy
Chansoo Lee
Abhinav Sinha
Ambuj Tewari
114
77
0
23 May 2014
Algorithms for multi-armed bandit problems
Algorithms for multi-armed bandit problems
Volodymyr Kuleshov
Doina Precup
130
350
0
25 Feb 2014
Generalization and Exploration via Randomized Value Functions
Generalization and Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Zheng Wen
77
314
0
04 Feb 2014
Kullback-Leibler upper confidence bounds for optimal sequential
  allocation
Kullback-Leibler upper confidence bounds for optimal sequential allocation
Olivier Cappé
Aurélien Garivier
Odalric-Ambrym Maillard
Rémi Munos
Gilles Stoltz
120
395
0
03 Oct 2012
Further Optimal Regret Bounds for Thompson Sampling
Further Optimal Regret Bounds for Thompson Sampling
Shipra Agrawal
Navin Goyal
100
442
0
15 Sep 2012
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
E. Kaufmann
N. Korda
Rémi Munos
157
588
0
18 May 2012
Challenging the empirical mean and empirical variance: a deviation study
Challenging the empirical mean and empirical variance: a deviation study
O. Catoni
161
462
0
10 Sep 2010
1