Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs

25 March 2025

Alexander Ryabchenko

Main:15 Pages

2 Figures

Bibliography:4 Pages

9 Tables

Appendix:23 Pages

Abstract

We study online learning with oblivious losses and delays under a novel ``capacity constraint'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Under ``clairvoyance'' (i.e., delay durations are revealed upfront each round) and/or ``preemptibility'' (i.e., we have ability to stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the ``optimal capacity'' needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For $K$ actions and total delay $D$ over $T$ rounds, under clairvoyance and assuming capacity $C = \Omega(\log(T))$ , we achieve regret $\widetilde{\Theta}(\sqrt{TK + DK/C + D\log(K)})$ for bandits and $\widetilde{\Theta}(\sqrt{(D+T)\log(K)})$ for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound $d_{\max}$ , adding $\smash{\widetilde{O}(d_{\max})}$ to the regret. For fixed delays $d$ (i.e., $D=Td$ ), the minimax regret is $\Theta\bigl(\sqrt{TK(1+d/C)+Td\log(K)}\bigr)$ and the optimal capacity is $\Theta(\min\{K/\log(K),d\}\bigr)$ in the bandit setting, while in the full-information setting, the minimax regret is $\Theta\bigl(\sqrt{T(d+1)\log(K)}\bigr)$ and the optimal capacity is $\Theta(1)$ . For round-dependent and fixed delays, our upper bounds are achieved using novel scheduling policies, based on Pareto-distributed proxy delays and batching techniques. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.

View on arXiv

@article{ryabchenko2025_2503.19856,
  title={ Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs },
  author={ Alexander Ryabchenko and Idan Attias and Daniel M. Roy },
  journal={arXiv preprint arXiv:2503.19856},
  year={ 2025 }
}

Comments on this paper