37
26

Approximation Schemes for Low-Rank Binary Matrix Approximation Problems

Abstract

We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constrains. The new constrained clustering problem encompasses a number of problems and by solving it, we obtain the first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are \textsc{Low GF(2)-Rank Approximation}, \textsc{Low Boolean-Rank Approximation}, and various versions of \textsc{Binary Clustering}. For example, for \textsc{Low GF(2)-Rank Approximation} problem, where for an m×nm\times n binary matrix AA and integer r>0r>0, we seek for a binary matrix BB of GF2GF_2 rank at most rr such that 0\ell_0 norm of matrix ABA-B is minimum, our algorithm, for any ϵ>0\epsilon>0 in time f(r,ϵ)nm f(r,\epsilon)\cdot n\cdot m, where ff is some computable function, outputs a (1+ϵ)(1+\epsilon)-approximate solution with probability at least (11e)(1-\frac{1}{e}). Our approximation algorithms substantially improve the running times and approximation factors of previous works. We also give (deterministic) PTASes for these problems running in time nf(r)1ϵ2log1ϵn^{f(r)\frac{1}{\epsilon^2}\log \frac{1}{\epsilon}}, where ff is some function depending on the problem. Our algorithm for the constrained clustering problem is based on a novel sampling lemma, which is interesting in its own.

View on arXiv
Comments on this paper