11
126

A simple measure of conditional dependence

Abstract

We propose a coefficient of conditional dependence between two random variables YY and ZZ given a set of other variables X1,,XpX_1,\ldots,X_p, based on an i.i.d. sample. The coefficient has a long list of desirable properties, the most important of which is that under absolutely no distributional assumptions, it converges to a limit in [0,1][0,1], where the limit is 00 if and only if YY and ZZ are conditionally independent given X1,,XpX_1,\ldots,X_p, and is 11 if and only if YY is equal to a measurable function of ZZ given X1,,XpX_1,\ldots,X_p. Moreover, it has a natural interpretation as a nonlinear generalization of the familiar partial R2R^2 statistic for measuring conditional dependence by regression. Using this statistic, we devise a new variable selection algorithm, called Feature Ordering by Conditional Independence (FOCI), which is model-free, has no tuning parameters, and is provably consistent under sparsity assumptions. A number of applications to synthetic and real datasets are worked out.

View on arXiv
Comments on this paper