Clusters and features from combinatorial stochastic processes

In partitioning---a.k.a. clustering---data, we associate each data point with one and only one of some collection of groups called clusters or partition blocks. Here, we formally develop an analogous problem, called feature allocation, for associating data points with arbitrary non-negative integer numbers of groups, now called features or topics. We review known combinatorial stochastic process representations of clustering and develop analogous representations for the feature allocation case. We illustrate the clustering representations with examples that include the canonical nonparametric Bayesian clustering prior: the Chinese restaurant process or Dirichlet process. We not only illustrate the feature allocation representations with the canonical nonparametric Bayesian feature prior---the Indian buffet process or beta process---but also simultaneously discover new connections between the different representations for the Indian buffet process. We thereby bring the same level of completeness to the treatment of the Indian buffet that has previously been developed for the Chinese restaurant.
View on arXiv