Polymatroid Bandits
A polymatroid is a polyhedron that is closely related to computational efficiency in polyhedral optimization. In particular, it is well known that the maximum of a modular function on a polymatroid can be found greedily. In this work, we bring together the ideas of polymatroids and bandits, and propose a learning variant of maximizing a modular function on a polymatroid, polymatroid bandits. We also propose a computationally efficient algorithm for solving the problem and bound its expected cumulative regret. Our gap-dependent upper bound matches a lower bound in matroid bandits and our gap-free upper bound matches a minimax lower bound in adversarial combinatorial bandits, up to a factor of . We evaluate our algorithm on a simple synthetic problem and compare it to several baselines. Our results demonstrate that the proposed method is practical.
View on arXiv