25
8

Best-item Learning in Random Utility Models with Subset Choices

Abstract

We consider the problem of PAC learning the most valuable item from a pool of nn items using sequential, adaptively chosen plays of subsets of kk items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities. We identify a new property of such a RUM, termed the minimum advantage, that helps in characterizing the complexity of separating pairs of items based on their relative win/loss empirical counts, and can be bounded as a function of the noise distribution alone. We give a learning algorithm for general RUMs, based on pairwise relative counts of items and hierarchical elimination, along with a new PAC sample complexity guarantee of O(nc2ϵ2logkδ)O(\frac{n}{c^2\epsilon^2} \log \frac{k}{\delta}) rounds to identify an ϵ\epsilon-optimal item with confidence 1δ1-\delta, when the worst case pairwise advantage in the RUM has sensitivity at least cc to the parameter gaps of items. Fundamental lower bounds on PAC sample complexity show that this is near-optimal in terms of its dependence on n,kn,k and cc.

View on arXiv
Comments on this paper