36
5
v1v2 (latest)

Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics

Abstract

Given a sample of size nn from a population of individuals belonging to different species with unknown proportions, a popular problem of practical interest consists in making inference on the probability Dn(l)D_{n}(l) that the (n+1)(n+1)-th draw coincides with a species with frequency ll in the sample, for any l=0,1,,nl=0,1,\ldots,n. This paper contributes to the methodology of Bayesian nonparametric inference for Dn(l)D_{n}(l). Specifically, under the general framework of Gibbs-type priors we show how to derive credible intervals for a Bayesian nonparametric estimation of Dn(l)D_{n}(l), and we investigate the large nn asymptotic behaviour of such an estimator. Of particular interest are special cases of our results obtained under the specification of the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior, which are two of the most commonly used Gibbs-type priors. With respect to these two prior specifications, the proposed results are illustrated through a simulation study and a benchmark Expressed Sequence Tags dataset. To the best our knowledge, this illustration provides the first comparative study between the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior in the context of Bayesian nonparemetric inference for Dn(l)D_{n}(l).

View on arXiv
Comments on this paper