ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.19856
34
0

Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs

25 March 2025
Alexander Ryabchenko
Idan Attias
Daniel M. Roy
    CLL
ArXivPDFHTML
Abstract

We study online learning with oblivious losses and delays under a novel ``capacity constraint'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Under ``clairvoyance'' (i.e., delay durations are revealed upfront each round) and/or ``preemptibility'' (i.e., we have ability to stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the ``optimal capacity'' needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For KKK actions and total delay DDD over TTT rounds, under clairvoyance and assuming capacity C=Ω(log⁡(T))C = \Omega(\log(T))C=Ω(log(T)), we achieve regret Θ~(TK+DK/C+Dlog⁡(K))\widetilde{\Theta}(\sqrt{TK + DK/C + D\log(K)})Θ(TK+DK/C+Dlog(K)​) for bandits and Θ~((D+T)log⁡(K))\widetilde{\Theta}(\sqrt{(D+T)\log(K)})Θ((D+T)log(K)​) for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound dmax⁡d_{\max}dmax​, adding O~(dmax⁡)\smash{\widetilde{O}(d_{\max})}O(dmax​) to the regret. For fixed delays ddd (i.e., D=TdD=TdD=Td), the minimax regret is Θ(TK(1+d/C)+Tdlog⁡(K))\Theta\bigl(\sqrt{TK(1+d/C)+Td\log(K)}\bigr)Θ(TK(1+d/C)+Tdlog(K)​) and the optimal capacity is Θ(min⁡{K/log⁡(K),d})\Theta(\min\{K/\log(K),d\}\bigr)Θ(min{K/log(K),d}) in the bandit setting, while in the full-information setting, the minimax regret is Θ(T(d+1)log⁡(K))\Theta\bigl(\sqrt{T(d+1)\log(K)}\bigr)Θ(T(d+1)log(K)​) and the optimal capacity is Θ(1)\Theta(1)Θ(1). For round-dependent and fixed delays, our upper bounds are achieved using novel scheduling policies, based on Pareto-distributed proxy delays and batching techniques. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.

View on arXiv
@article{ryabchenko2025_2503.19856,
  title={ Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs },
  author={ Alexander Ryabchenko and Idan Attias and Daniel M. Roy },
  journal={arXiv preprint arXiv:2503.19856},
  year={ 2025 }
}
Comments on this paper