Compress Then Test: Powerful Kernel Testing in Near-linear Time

Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on sample points. However, existing kernel tests either run in time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximates an expensive test by compressing each point sample into a small but provably high-fidelity coreset. For standard kernels and subexponential distributions, CTT inherits the statistical behavior of a quadratic-time test -- recovering the same optimal detection boundary -- while running in near-linear time. We couple these advances with cheaper permutation testing, justified by new power analyses; improved time-vs.-quality guarantees for low-rank approximation; and a fast aggregation procedure for identifying especially discriminating kernels. In our experiments with real and simulated data, CTT and its extensions provide 20--200x speed-ups over state-of-the-art approximate MMD tests with no loss of power.
View on arXiv@article{domingo-enrich2025_2301.05974, title={ Compress Then Test: Powerful Kernel Testing in Near-linear Time }, author={ Carles Domingo-Enrich and Raaz Dwivedi and Lester Mackey }, journal={arXiv preprint arXiv:2301.05974}, year={ 2025 } }