24
1

Early Stopping Based on Repeated Significance

Abstract

For a bucket test with a single criterion for success and a fixed number of samples or testing period, requiring a pp-value less than a specified value of α\alpha for the success criterion produces statistical confidence at level 1α1 - \alpha. For multiple criteria, a Bonferroni correction that partitions α\alpha among the criteria produces statistical confidence, at the cost of requiring lower pp-values for each criterion. The same concept can be applied to decisions about early stopping, but that can lead to strict requirements for pp-values. We show how to address that challenge by requiring criteria to be successful at multiple decision points.

View on arXiv
Comments on this paper