ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.02252
19
32

Causal Bandits with Propagating Inference

6 June 2018
Akihiro Yabe
Daisuke Hatano
Hanna Sumita
Shinji Ito
Naonori Kakimura
Takuro Fukunaga
Ken-ichi Kawarabayashi
    CML
ArXivPDFHTML
Abstract

Bandit is a framework for designing sequential experiments. In each experiment, a learner selects an arm A∈AA \in \mathcal{A}A∈A and obtains an observation corresponding to AAA. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms ∣A∣|\mathcal{A}|∣A∣. This makes bandit incapable of handling an exponentially large number of arms, hence the bandit problem with side-information is often considered to overcome this lower bound. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information. A causal graph is a fundamental model that is frequently used with a variety of real problems. In this setting, the arms are identified with interventions on a given causal graph, and the effect of an intervention propagates throughout all over the causal graph. The task is to find the best intervention that maximizes the expected value on a target node. Existing algorithms for causal bandit overcame the Ω(∣A∣/T)\Omega(\sqrt{|\mathcal{A}|/T})Ω(∣A∣/T​) simple-regret lower-bound; however, their algorithms work only when the interventions A\mathcal{A}A are localized around a single node (i.e., an intervention propagates only to its neighbors). We propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(γ∗log⁡(∣A∣T)/T)O(\sqrt{ \gamma^*\log(|\mathcal{A}|T) / T})O(γ∗log(∣A∣T)/T​) regret bound, where γ∗\gamma^*γ∗ is determined by using a causal graph structure. In particular, if the in-degree of the causal graph is bounded, then γ∗=O(N2)\gamma^* = O(N^2)γ∗=O(N2), where NNN is the number NNN of nodes.

View on arXiv
Comments on this paper