ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.12463
13
4

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

24 June 2022
Yifan Lin
Yuhao Wang
Enlu Zhou
ArXivPDFHTML
Abstract

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For TTT rounds, KKK actions, and ddd-dimensional feature vectors, we prove a regret bound of O((1+ρ+1ρ)dln⁡Tln⁡KδdKT1+2ϵln⁡Kδ1ϵ)O((1+\rho+\frac{1}{\rho}) d\ln T \ln \frac{K}{\delta}\sqrt{d K T^{1+2\epsilon} \ln \frac{K}{\delta} \frac{1}{\epsilon}})O((1+ρ+ρ1​)dlnTlnδK​dKT1+2ϵlnδK​ϵ1​​) that holds with probability 1−δ1-\delta1−δ under the mean-variance criterion with risk tolerance ρ\rhoρ, for any 0<ϵ<120<\epsilon<\frac{1}{2}0<ϵ<21​, 0<δ<10<\delta<10<δ<1. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.

View on arXiv
Comments on this paper