43
55

Exact sampling of determinantal point processes with sublinear time preprocessing

Michal Derezinski
Daniele Calandriello
Michal Valko
Abstract

We study the complexity of sampling from a distribution over all index subsets of the set {1,...,n}\{1,...,n\} with the probability of a subset SS proportional to the determinant of the submatrix LS\mathbf{L}_S of some n×nn\times n p.s.d. matrix L\mathbf{L}, where LS\mathbf{L}_S corresponds to the entries of L\mathbf{L} indexed by SS. Known as a determinantal point process, this distribution is used in machine learning to induce diversity in subset selection. In practice, we often wish to sample multiple subsets SS with small expected size k=E[S]nk = E[|S|] \ll n from a very large matrix L\mathbf{L}, so it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). For this purpose, we propose a new algorithm which, given access to L\mathbf{L}, samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is npoly(k)n \cdot \text{poly}(k), i.e., sublinear in the size of L\mathbf{L}, and (2) its sampling cost is poly(k)\text{poly}(k), i.e., independent of the size of L\mathbf{L}. Prior to our results, state-of-the-art exact samplers required O(n3)O(n^3) preprocessing time and sampling time linear in nn or dependent on the spectral properties of L\mathbf{L}. We also give a reduction which allows using our algorithm for exact sampling from cardinality constrained determinantal point processes with npoly(k)n\cdot\text{poly}(k) time preprocessing.

View on arXiv
Comments on this paper