34
8

Communication-avoiding Cholesky-QR2 for rectangular matrices

Abstract

Scalable algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We address this concern by presenting a new scalable QR factorization algorithm intended to accelerate these problems for rectangular matrices. Our contribution is a communication-avoiding distributed-memory parallelization of an existing Cholesky-based QR factorization algorithm called CholeskyQR2. Our algorithm executes on a 3D processor grid, the dimensions of which can be tuned to trade-off costs in synchronization, interprocessor communication, computational work, and memory footprint. It improves the communication cost complexity with respect to state-of-the-art parallel QR implementations by Θ(P1/6)\Theta(P^{1/6}). We implement the new 3D CholeskyQR2 algorithm and study its performance relative to ScaLAPACK on Stampede 2, an Intel Knights Landing cluster, demonstrating improvements in parallel scalability and absolute performance.

View on arXiv
Comments on this paper