ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.05294
30
0

Pure Exploration with Structured Preference Feedback

12 April 2021
Shubham Gupta
Aadirupa Saha
S. Katariya
ArXivPDFHTML
Abstract

We consider the problem of pure exploration with subset-wise preference feedback, which contains NNN arms with features. The learner is allowed to query subsets of size KKK and receives feedback in the form of a noisy winner. The goal of the learner is to identify the best arm efficiently using as few queries as possible. This setting is relevant in various online decision-making scenarios involving human feedback such as online retailing, streaming services, news feed, and online advertising; since it is easier and more reliable for people to choose a preferred item from a subset than to assign a likability score to an item in isolation. To the best of our knowledge, this is the first work that considers the subset-wise preference feedback model in a structured setting, which allows for potentially infinite set of arms. We present two algorithms that guarantee the detection of the best-arm in O~(d2KΔ2)\tilde{O} (\frac{d^2}{K \Delta^2})O~(KΔ2d2​) samples with probability at least 1−δ1 - \delta1−δ, where ddd is the dimension of the arm-features and Δ\DeltaΔ is the appropriate notion of utility gap among the arms. We also derive an instance-dependent lower bound of Ω(dΔ2log⁡1δ)\Omega(\frac{d}{\Delta^2} \log \frac{1}{\delta})Ω(Δ2d​logδ1​) which matches our upper bound on a worst-case instance. Finally, we run extensive experiments to corroborate our theoretical findings, and observe that our adaptive algorithm stops and requires up to 12x fewer samples than a non-adaptive algorithm.

View on arXiv
Comments on this paper