Exact sampling of determinantal point processes with sublinear time preprocessing

31 May 2019

Daniele Calandriello

Abstract

We study the complexity of sampling from a distribution over all index subsets of the set $\{1,...,n\}$ with the probability of a subset $S$ proportional to the determinant of the submatrix $\mathbf{L}_S$ of some $n\times n$ p.s.d. matrix $\mathbf{L}$ , where $\mathbf{L}_S$ corresponds to the entries of $\mathbf{L}$ indexed by $S$ . Known as a determinantal point process, this distribution is used in machine learning to induce diversity in subset selection. In practice, we often wish to sample multiple subsets $S$ with small expected size $k = E[|S|] \ll n$ from a very large matrix $\mathbf{L}$ , so it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). For this purpose, we propose a new algorithm which, given access to $\mathbf{L}$ , samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is $n \cdot \text{poly}(k)$ , i.e., sublinear in the size of $\mathbf{L}$ , and (2) its sampling cost is $\text{poly}(k)$ , i.e., independent of the size of $\mathbf{L}$ . Prior to our results, state-of-the-art exact samplers required $O(n^3)$ preprocessing time and sampling time linear in $n$ or dependent on the spectral properties of $\mathbf{L}$ . We also give a reduction which allows using our algorithm for exact sampling from cardinality constrained determinantal point processes with $n\cdot\text{poly}(k)$ time preprocessing.

View on arXiv

Comments on this paper