In this work, we give a statistical characterization of the -regret for arbitrary structured bandit problems, the regret which arises when comparing against a benchmark that is times the optimal solution. The -regret emerges in structured bandit problems over a function class where finding an exact optimum of is intractable. Our characterization is given in terms of the -DEC, a statistical complexity parameter for the class , which is a modification of the constrained Decision-Estimation Coefficient (DEC) of Foster et al., 2023 (and closely related to the original offset DEC of Foster et al., 2021). Our lower bound shows that the -DEC is a fundamental limit for any model class : for any algorithm, there exists some for which the -regret of that algorithm scales (nearly) with the -DEC of . We provide an upper bound showing that there exists an algorithm attaining a nearly matching -regret. Due to significant challenges in applying the prior results on the DEC to the -regret case, both our lower and upper bounds require novel techniques and a new algorithm.
View on arXiv