Pipe-BD: Pipelined Parallel Blockwise Distillation

29 January 2023

Papers citing "Pipe-BD: Pipelined Parallel Blockwise Distillation"

4 / 4 papers shown

Title
Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces Bert Moons Parham Noorzad Andrii Skliar G. Mariani Dushyant Mehta Chris Lott Tijmen Blankevoort 145 43 0 16 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,821 0 17 Sep 2019
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 950 20,567 0 17 Apr 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,890 0 15 Sep 2016