Adaptive Data Augmentation for Thompson Sampling

17 June 2025

Wonyoung Kim

ArXiv (abs)PDF HTML

Main:29 Pages

3 Figures

Bibliography:4 Pages

2 Tables

Abstract

In linear contextual bandits, the objective is to select actions that maximize cumulative rewards, modeled as a linear function with unknown parameters. Although Thompson Sampling performs well empirically, it does not achieve optimal regret bounds. This paper proposes a nearly minimax optimal Thompson Sampling for linear contextual bandits by developing a novel estimator with the adaptive augmentation and coupling of the hypothetical samples that are designed for efficient parameter learning. The proposed estimator accurately predicts rewards for all arms without relying on assumptions for the context distribution. Empirical results show robust performance and significant improvement over existing methods.

View on arXiv

@article{kim2025_2506.14479,
  title={ Adaptive Data Augmentation for Thompson Sampling },
  author={ Wonyoung Kim },
  journal={arXiv preprint arXiv:2506.14479},
  year={ 2025 }
}

Comments on this paper