25
10
v1v2 (latest)

Stochastic Top-KK Subset Bandits with Linear Space and Non-Linear Feedback

Abstract

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best KK out of NN options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses KK out of NN arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen KK arms. The direct use of multi-armed bandit requires choosing among NN-choose-KK options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in NN. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of O~(K12N13T23)\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}}) for a time horizon TT, which is \textit{sub-linear} in all parameters TT, NN, and KK. %When applied to the problem of Social Influence Maximization, the performance of the proposed algorithm surpasses the UCB algorithm and some more sophisticated domain-specific methods.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.