ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10817
19
0

Restless Linear Bandits

17 May 2024
A. Khaleghi
ArXivPDFHTML
Abstract

A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown Rd\mathbb{R}^dRd-valued stationary φ\varphiφ-mixing sequence of parameters (θt, t∈N)(\theta_t,~t \in \mathbb{N})(θt​, t∈N) which gives rise to pay-offs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the φ\varphiφ-dependence between consecutive θt\theta_tθt​. An optimistic algorithm, called LinMix-UCB, is proposed for the case where θt\theta_tθt​ has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of O(dnpolylog(n))\mathcal{O}\left(\sqrt{d n\mathrm{polylog}(n) }\right)O(dnpolylog(n)​) with respect to an oracle that always plays a multiple of Eθt\mathbb{E}\theta_tEθt​. The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee's coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of Eθt\mathbb{E}\theta_tEθt​.

View on arXiv
Comments on this paper