Methods to balance pre-treatment measurements (baseline covariates) are in pervasive use throughout the practice of controlled experimentation, including blocking, pairwise matching, and re-randomization. However, we here argue that no balance better than complete randomization can be achieved in a uniform way without partial structural knowledge about the treatment effects; in particular, about their expectation conditioned on the baseline covariates, known as the regression function. We propose a novel formulation for such structural knowledge in terms of membership in a possibly infinite-dimensional normed vector space of functions -- a mild assumption that can be made nearly null. We then propose to choose the optimal experimental design with respect to the worst realization of the regression function. This allows us to bound the variance of the resulting estimator and show that it converges quickly to the best possible. We characterize the consistency of the estimator, formulate the design problem, and develop inferential algorithms.
View on arXiv