v1v2 (latest)

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

IEEE Conference on High Performance Extreme Computing (HPEC), 2020

14 July 2020

Andrew Kirby

S. Samsi

Michael Jones

Albert Reuther

J. Kepner

V. Gadepally

ArXiv (abs)PDF HTML

Abstract

A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x speedup over traditional layer-wise model parallelism techniques using the same number of compute units.

View on arXiv

Comments on this paper