What advantage do \emph{sequential} procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions and on are equal or -far, we give several answers to this question. We show that for a small alphabet size , there is a sequential algorithm that outperforms any batch algorithm by a factor of at least in terms sample complexity. For a general alphabet size , we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance between and is larger than . As a corollary, letting go to , we obtain a sequential algorithm for testing closeness when no a priori bound on is given that has a sample complexity : this improves over the tester of \cite{daskalakis2017optimal} and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing identity and closeness: they can improve the worst case number of samples by at most a constant factor.
View on arXiv