The minimax sample complexity of group distributionally robust optimization (GDRO) has been determined up to a factor, where is the number of groups. In this work, we venture beyond the minimax perspective via a novel notion of sparsity that we dub -sparsity. In short, this condition means that at any parameter , there is a set of at most groups whose risks at all are at least larger than the risks of the other groups. To find an -optimal , we show via a novel algorithm and analysis that the -dependent term in the sample complexity can swap a linear dependence on for a linear dependence on the potentially much smaller . This improvement leverages recent progress in sleeping bandits, showing a fundamental connection between the two-player zero-sum game optimization framework for GDRO and per-action regret bounds in sleeping bandits. We next show an adaptive algorithm which, up to log factors, gets a sample complexity bound that adapts to the best -sparsity condition that holds. We also show how to get a dimension-free semi-adaptive sample complexity bound with a computationally efficient method. Finally, we demonstrate the practicality of the -sparsity condition and the improved sample efficiency of our algorithms on both synthetic and real-life datasets.
View on arXiv