42
11

Microbial Composition Estimation from Sparse Count Data

Abstract

Under-sampling and limited sequencing depth in microbiome studies often lead to sparse count data and biased estimates of microbiome richness and diversities. This paper proposes a general framework for composition matrix estimation from high-dimensional sparse count data, where a Poisson-multinomial model is used to model for read count data from metageomic sequencing. A regularized maximum likelihood estimation is proposed to estimate the underlying composition matrix under the approximately low-rank assumption. The near-matching theoretical upper and lower bounds of the estimation errors are established in both Kullback-Leibler divergence and Frobenius norm. Simulation studies demonstrate that the regularized maximum likelihood estimator outperforms the commonly used ones in previous literature. The method is further illustrated by an application to a human gut microbiome study.

View on arXiv
Comments on this paper