68
1

Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs

Main:15 Pages
2 Figures
Bibliography:4 Pages
9 Tables
Appendix:23 Pages
Abstract

We study online learning with oblivious losses and delays under a novel ``capacity constraint'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Under ``clairvoyance'' (i.e., delay durations are revealed upfront each round) and/or ``preemptibility'' (i.e., we have ability to stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the ``optimal capacity'' needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For KK actions and total delay DD over TT rounds, under clairvoyance and assuming capacity C=Ω(log(T))C = \Omega(\log(T)), we achieve regret Θ~(TK+DK/C+Dlog(K))\widetilde{\Theta}(\sqrt{TK + DK/C + D\log(K)}) for bandits and Θ~((D+T)log(K))\widetilde{\Theta}(\sqrt{(D+T)\log(K)}) for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound dmaxd_{\max}, adding O~(dmax)\smash{\widetilde{O}(d_{\max})} to the regret. For fixed delays dd (i.e., D=TdD=Td), the minimax regret is Θ(TK(1+d/C)+Tdlog(K))\Theta\bigl(\sqrt{TK(1+d/C)+Td\log(K)}\bigr) and the optimal capacity is Θ(min{K/log(K),d})\Theta(\min\{K/\log(K),d\}\bigr) in the bandit setting, while in the full-information setting, the minimax regret is Θ(T(d+1)log(K))\Theta\bigl(\sqrt{T(d+1)\log(K)}\bigr) and the optimal capacity is Θ(1)\Theta(1). For round-dependent and fixed delays, our upper bounds are achieved using novel scheduling policies, based on Pareto-distributed proxy delays and batching techniques. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.

View on arXiv
@article{ryabchenko2025_2503.19856,
  title={ Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs },
  author={ Alexander Ryabchenko and Idan Attias and Daniel M. Roy },
  journal={arXiv preprint arXiv:2503.19856},
  year={ 2025 }
}
Comments on this paper