59
25

Active Ranking with Subset-wise Preferences

Abstract

We consider the problem of probably approximately correct (PAC) ranking nn items by adaptively eliciting subset-wise preference feedback. At each round, the learner chooses a subset of kk items and observes stochastic feedback indicating preference information of the winner (most preferred) item of the chosen subset drawn according to a Plackett-Luce (PL) subset choice model unknown a priori. The objective is to identify an ϵ\epsilon-optimal ranking of the nn items with probability at least 1δ1 - \delta. When the feedback in each subset round is a single Plackett-Luce-sampled item, we show (ϵ,δ)(\epsilon, \delta)-PAC algorithms with a sample complexity of O(nϵ2lnnδ)O\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right) rounds, which we establish as being order-optimal by exhibiting a matching sample complexity lower bound of Ω(nϵ2lnnδ)\Omega\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right)---this shows that there is essentially no improvement possible from the pairwise comparisons setting (k=2k = 2). When, however, it is possible to elicit top-mm (k\leq k) ranking feedback according to the PL model from each adaptively chosen subset of size kk, we show that an (ϵ,δ)(\epsilon, \delta)-PAC ranking sample complexity of O(nmϵ2lnnδ)O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right) is achievable with explicit algorithms, which represents an mm-wise reduction in sample complexity compared to the pairwise case. This again turns out to be order-wise unimprovable across the class of symmetric ranking algorithms. Our algorithms rely on a novel {pivot trick} to maintain only nn itemwise score estimates, unlike O(n2)O(n^2) pairwise score estimates that has been used in prior work. We report results of numerical experiments that corroborate our findings.

View on arXiv
Comments on this paper