We consider the problem of estimating the support size of a discrete distribution whose minimum non-zero mass is at least . Under the independent sampling model, we show that the sample complexity, i.e., the minimal sample size to achieve an additive error of with probability at least 0.1 is within universal constant factors of , which improves the state-of-the-art result of in \cite{VV13}. Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in time and attains the sample complexity within a factor of six asymptotically. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets.
View on arXiv