Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms

International Conference on Learning Representations (ICLR), 2019

26 August 2019

Chuang Gan

Song Han

ArXiv (abs)PDF HTML Github (1916★)

Abstract

We address the challenging problem of efficient deep learning model deployment across many devices and diverse constraints, from general-purpose hardware to specialized accelerators. Conventional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally prohibitive (causing $CO_2$ emission as much as 5 cars' lifetime) thus unscalable. To reduce the cost, our key idea is to decouple model training from architecture search. To this end, we propose to train a once-for-all network (OFA) that supports diverse architectural settings (depth, width, kernel size, and resolution). Given a deployment scenario, we can then quickly get a specialized sub-network by selecting from the OFA network without additional training. To prevent interference between many sub-networks during training, we also propose a novel progressive shrinking algorithm, which can train a surprisingly large number of sub-networks ( $> 10^{19}$ ) simultaneously. Extensive experiments on various hardware platforms (CPU, GPU, mCPU, mGPU, FPGA accelerator) show that OFA consistently outperforms SOTA NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3) while reducing orders of magnitude GPU hours and $CO_2$ emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting ( $<$ 600M FLOPs). Code and pre-trained models are released at https://github.com/mit-han-lab/once-for-all.

View on arXiv

Comments on this paper