Change point detection in high dimensional data with U-statistics

We consider the problem of detecting distributional changes in a sequence of high dimensional data. Our proposed methods are nonparametric, suitable for either continuous or discrete data, and are based on weighted cumulative sums of U-statistics stemming from norms. We establish the asymptotic distribution of our proposed test statistics separately in cases of weakly dependent and strongly dependent coordinates as , where denotes sample size and is the dimension, and also provide sufficient conditions for consistency of the proposed test procedures under a general fixed alternative with one change point. We further assess finite sample performance of the test procedures through Monte Carlo studies, and conclude with two applications to Twitter data concerning the mentions of U.S. Governors and the frequency of tweets containing social justice keywords.
View on arXiv