Cascaded High Dimensional Histograms: A Generative Approach to Density Estimation
- TPM

We consider the problem of interpretable density estimation for high dimensional categorical data. In one or two dimensions, we would naturally consider histograms (bar charts) for simple density estimation problems. However, histograms do not scale to higher dimensions in an interpretable way, and one cannot usually visualize a high dimensional histogram. This work presents an alternative to the histogram for higher dimensions that can be directly visualized. These density models are in the form of a cascaded set of conditions (a tree structure), where each node in the tree is estimated to have constant density. We present three models for this task, where the first one allows the user to specify the number of desired leaves in the tree as a Bayesian prior. The second model allows the user to specify the desired number of branches within the prior. The third model allows the user to specify the desired number of rules and the length of rules within the prior and returns a list. Our results indicate that the new approach yields sparser trees than other approaches that achieve similar test performance.
View on arXiv