ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.00392
11
6

Delayed Feedback in Kernel Bandits

1 February 2023
Sattar Vakili
Danyal Ahmed
A. Bernacchia
Ciara Pike-Burke
ArXivPDFHTML
Abstract

Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with O~(Γk(T)T+E[τ])\tilde{\mathcal{O}}(\sqrt{\Gamma_k(T)T}+\mathbb{E}[\tau])O~(Γk​(T)T​+E[τ]) regret, where TTT is the number of time steps, Γk(T)\Gamma_k(T)Γk​(T) is the maximum information gain of the kernel with TTT observations, and τ\tauτ is the delay random variable. This represents a significant improvement over the state of the art regret bound of O~(Γk(T)T+E[τ]Γk(T))\tilde{\mathcal{O}}(\Gamma_k(T)\sqrt{T}+\mathbb{E}[\tau]\Gamma_k(T))O~(Γk​(T)T​+E[τ]Γk​(T)) reported in Verma et al. (2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations.

View on arXiv
Comments on this paper