33
13

Optimal Streaming Algorithms for Multi-Armed Bandits

Abstract

This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of nn arms with reward distributions supported on [0,1][0,1] with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \eps-toptop-kk arms identification problem, which asks for kk arms whose reward means are lower than that of the kk-th best arm by at most \eps\eps with probability at least 1δ1-\delta. For general \eps(0,1)\eps \in (0,1), the existing solution for this problem assumes k=1k = 1 and achieves the optimal sample complexity O(n\eps2log1δ)O(\frac{n}{\eps^2} \log \frac{1}{\delta}) using O(log(n))O(\log^*(n)) (log(n)\log^*(n) equals the number of times that we need to apply the logarithm function on nn before the results is no more than 1.) memory and a single pass of the stream. We propose an algorithm that works for any kk and achieves the optimal sample complexity O(n\eps2logkδ)O(\frac{n}{\eps^2} \log\frac{k}{\delta}) using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least 1δ1-\delta probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within O(logΔ21)O(\log \Delta_2^{-1}) passes, where Δ2\Delta_2 is the gap between the mean of the best arm and that of the second best arm.

View on arXiv
Comments on this paper