Differentially Private Approximate Pattern Matching

In this paper, we consider the -approximate pattern matching problem under differential privacy, where the goal is to report or count all substrings of a given string which have a Hamming distance at most to a pattern , or decide whether such a substring exists. In our definition of privacy, individual positions of the string are protected. To be able to answer queries under differential privacy, we allow some slack on , i.e. we allow reporting or counting substrings of with a distance at most to , for a multiplicative error and an additive error . We analyze which values of and are necessary or sufficient to solve the -approximate pattern matching problem while satisfying -differential privacy. Let denote the length of . We give 1) an -differentially private algorithm with an additive error of and no multiplicative error for the existence variant; 2) an -differentially private algorithm with an additive error for the counting variant; 3) an -differentially private algorithm with an additive error of and multiplicative error for the reporting variant for a special class of patterns. The error bounds hold with high probability. All of these algorithms return a witness, that is, if there exists a substring of with distance at most to , then the algorithm returns a substring of with distance at most to . Further, we complement these results by a lower bound, showing that any algorithm for the existence variant which also returns a witness must have an additive error of with constant probability.
View on arXiv