Early Stopping Based on Repeated Significance

Abstract
For a bucket test with a single criterion for success and a fixed number of samples or testing period, requiring a -value less than a specified value of for the success criterion produces statistical confidence at level . For multiple criteria, a Bonferroni correction that partitions among the criteria produces statistical confidence, at the cost of requiring lower -values for each criterion. The same concept can be applied to decisions about early stopping, but that can lead to strict requirements for -values. We show how to address that challenge by requiring criteria to be successful at multiple decision points.
View on arXivComments on this paper