21
1
v1v2 (latest)

Impossibility of phylogeny reconstruction from kk-mer counts

Abstract

We consider phylogeny estimation under a two-state model of sequence evolution by site substitution on a tree. In the asymptotic regime where the sequence lengths tend to infinity, we show that for any fixed kk no statistically consistent phylogeny estimation is possible from kk-mer counts over the full leaf sequences alone. Formally, we establish that the joint distribution of kk-mer counts over the entire leaf sequences on two distinct trees have total variation distance bounded away from 11 as the sequence length tends to infinity. Our impossibility result implies that statistical consistency requires more sophisticated use of kk-mer count information, such as block techniques developed in previous theoretical work.

View on arXiv
Comments on this paper