44
1

Change point detection in high dimensional data with U-statistics

Abstract

We consider the problem of detecting distributional changes in a sequence of high dimensional data. Our proposed methods are nonparametric, suitable for either continuous or discrete data, and are based on weighted cumulative sums of U-statistics stemming from LpL_p norms. We establish the asymptotic distribution of our proposed test statistics separately in cases of weakly dependent and strongly dependent coordinates as min{N,d}\min\{N,d\}\to\infty, where NN denotes sample size and dd is the dimension, and also provide sufficient conditions for consistency of the proposed test procedures under a general fixed alternative with one change point. We further assess finite sample performance of the test procedures through Monte Carlo studies, and conclude with two applications to Twitter data concerning the mentions of U.S. Governors and the frequency of tweets containing social justice keywords.

View on arXiv
Comments on this paper