Dualizing Le Cam's method for functional estimation, with applications to estimating the unseens

14 February 2019

Abstract

Le Cam's method (or the two-point method) is a commonly used tool for obtaining statistical lower bound and especially popular for functional estimation problems. This work aims to explain and give conditions for the tightness of Le Cam's lower bound in functional estimation from the perspective of convex duality. Under a variety of settings it is shown that the maximization problem that searches for the best two-point lower bound, upon dualizing, becomes a minimization problem that optimizes the bias-variance tradeoff among a family of estimators. For estimating linear functionals of a distribution our work strengthens prior results of Donoho-Liu \cite{DL91} (for quadratic loss) by dropping the H\"olderian assumption on the modulus of continuity. For exponential families our results extend those of Juditsky-Nemirovski \cite{JN09} by characterizing the minimax risk for the quadratic loss under weaker assumptions on the exponential family. We also provide an extension to the high-dimensional setting for estimating separable functionals. Notably, coupled with tools from complex analysis, this method is particularly effective for characterizing the ``elbow effect'' -- the phase transition from parametric to nonparametric rates. As the main application we derive sharp minimax rates in the Distinct elements problem (given a fraction $p$ of colored balls from an urn containing $d$ balls, the optimal error of estimating the number of distinct colors is $\tilde \Theta(d^{-\frac{1}{2}\min\{\frac{p}{1-p},1\}})$ ) and the Fisher's species problem (given $n$ iid observations from an unknown distribution, the optimal prediction error of the number of unseen symbols in the next (unobserved) $r \cdot n$ observations is $\tilde \Theta(n^{-\min\{\frac{1}{r+1},\frac{1}{2}\}})$ ).

View on arXiv

Comments on this paper