318

Polymatroid Bandits

Abstract

A polymatroid is a polytope which is closely related to computational efficiency in polyhedral optimization. In particular, it is well known that the maximum of a modular function on a polymatroid can be found greedily. In this work, we bring together the ideas of polymatroids and bandits, and propose a learning variant of maximizing a modular function on a polymatroid, polymatroid bandits. We also propose a computationally efficient algorithm for solving the problem and bound its expected cumulative regret. Our gap-dependent upper bound matches a lower bound in matroid bandits and our gap-free upper bound matches a minimax lower bound in adversarial combinatorial bandits, up to logarithmic factors. Finally, we evaluate our algorithm on a movie recommendation problem and show that it can learn how to recommend a set of diverse and popular movies.

View on arXiv
Comments on this paper