Finding Statistically Significant Attribute Interactions

22 December 2016

Abstract

In many data exploration tasks it is meaningful to identify groups of attribute interactions that are specific to a variable of interest. These interactions are also useful in several practical applications, for example, to gain insight into the structure of the data, in feature selection, and in data anonymisation. We present a novel method, based on statistical significance testing, that can be used to verify if the data set has been created by a given factorized class-conditional joint distribution, where the distribution is parametrized by partition of its attributes. Furthermore, we provide a method, named ASTRID, to automatically find a partition of attributes that describes the distribution that has generated the data. The state-of-the-art classifiers are utilized to capture the interactions present in the data by systematically breaking attribute interactions and observing the effect of this breaking on classifier performance. We empirically demonstrate the utility of the proposed method with real and synthetic data as well as with usage examples.

View on arXiv

Comments on this paper