Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of . The goal is to reconstruct the original string~ from independent traces of . While the best algorithms known for worst-case strings use traces \cite{DOS17,NazarovPeres17}, highly efficient algorithms are known \cite{PZ17,HPP18} for the \emph{average-case} version, in which is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call \emph{average-case population recovery in the presence of insertions and deletions}. In this problem, there is an unknown distribution over unknown source strings , and each sample is independently generated by drawing some from and returning an independent trace of . Building on \cite{PZ17} and \cite{HPP18}, we give an efficient algorithm for this problem. For any support size , for a fraction of all -element support sets , for every distribution supported on , our algorithm efficiently recovers up to total variation distance with high probability, given access to independent traces of independent draws from . The algorithm runs in time poly and its sample complexity is poly This polynomial dependence on the support size is in sharp contrast with the \emph{worst-case} version (when may be any strings in ), in which the sample complexity of the most efficient known algorithm \cite{BCFSS19} is doubly exponential in .
View on arXiv