We study online learning with oblivious losses and delays under a novel ``capacity constraint'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Under ``clairvoyance'' (i.e., delay durations are revealed upfront each round) and/or ``preemptibility'' (i.e., we have ability to stop tracking previously chosen round feedback), we establish matching upper and lower bounds (up to logarithmic terms) on achievable regret, characterizing the ``optimal capacity'' needed to match the minimax rates of classical delayed online learning, which implicitly assume unlimited capacity. Our algorithms achieve minimax-optimal regret across all capacity levels, with performance gracefully degrading under suboptimal capacity. For actions and total delay over rounds, under clairvoyance and assuming capacity , we achieve regret for bandits and for full-information feedback. When replacing clairvoyance with preemptibility, we require a known maximum delay bound , adding to the regret. For fixed delays (i.e., ), the minimax regret is and the optimal capacity is in the bandit setting, while in the full-information setting, the minimax regret is and the optimal capacity is . For round-dependent and fixed delays, our upper bounds are achieved using novel scheduling policies, based on Pareto-distributed proxy delays and batching techniques. Crucially, our work unifies delayed bandits, label-efficient learning, and online scheduling frameworks, demonstrating that robust online learning under delayed feedback is possible with surprisingly modest tracking capacity.
View on arXiv@article{ryabchenko2025_2503.19856, title={ Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs }, author={ Alexander Ryabchenko and Idan Attias and Daniel M. Roy }, journal={arXiv preprint arXiv:2503.19856}, year={ 2025 } }