ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.11795
15
11

Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits

23 February 2022
Suprovat Ghoshal
Aadirupa Saha
ArXivPDFHTML
Abstract

We introduce the \emph{Correlated Preference Bandits} problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of nnn items through online subsetwise preference feedback. We investigate whether models with a simple correlation structure, e.g. low rank, can result in faster learning rates. While we show that the problem can be impossible to solve for the general `low rank' choice models, faster learning rates can be attained assuming more structured item correlations. In particular, we introduce a new class of \emph{Block-Rank} based RUM model, where the best item is shown to be (ϵ,δ)(\epsilon,\delta)(ϵ,δ)-PAC learnable with only O(rϵ−2log⁡(n/δ))O(r \epsilon^{-2} \log(n/\delta))O(rϵ−2log(n/δ)) samples. This improves on the standard sample complexity bound of O~(nϵ−2log⁡(1/δ))\tilde{O}(n\epsilon^{-2} \log(1/\delta))O~(nϵ−2log(1/δ)) known for the usual learning algorithms which might not exploit the item-correlations (r≪nr \ll nr≪n). We complement the above sample complexity with a matching lower bound (up to logarithmic factors), justifying the tightness of our analysis. Surprisingly, we also show a lower bound of Ω(nϵ−2log⁡(1/δ))\Omega(n\epsilon^{-2}\log(1/\delta))Ω(nϵ−2log(1/δ)) when the learner is forced to play just duels instead of larger subsetwise queries. Further, we extend the results to a more general `\emph{noisy Block-Rank}' model, which ensures robustness of our techniques. Overall, our results justify the advantage of playing subsetwise queries over pairwise preferences (k=2)(k=2)(k=2), we show the latter provably fails to exploit correlation.

View on arXiv
Comments on this paper