Matroid Bandits: Fast Combinatorial Optimization with Learning

A matroid is a notion of independence in combinatorial optimization that characterizes problems that can be solved efficiently. In particular, it is well known that the maximum of a constrained modular function can be found greedily if and only if the constraints define a matroid. In this work, we bring together the concepts of matroids and bandits, and propose the first learning algorithm for maximizing a stochastic modular function on a matroid. The function is initially unknown and we learn it by interacting repeatedly with the environment. Our solution has two important properties. First, it is computationally efficient. In particular, its per-step time complexity is , where is the number of items in the ground set of a matroid. Second, it is provably sample efficient. Specifically, we show that the regret of the algorithm is at most linear in all constants of interest and sublinear in time. We also prove a lower bound and argue that our gap-dependent upper bound is tight. Our method is evaluated on three real-world problems and we demonstrate that it is practical.
View on arXiv