Stochastic Gradient Methods with Layer-wise Adaptive Moments for
Training of Deep Networks

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

27 May 2019

Boris Ginsburg

Oleksii Hrinchuk

Oleksii Kuchaiev

Vitaly Lavrukhin

Jason Chun Lok Li

Jonathan M. Cohen

Papers citing "Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks"

4 / 4 papers shown

Title
How to Fine-Tune Vision Models with SGD Ananya Kumar Ruoqi Shen Sébastien Bubeck Suriya Gunasekar VLM 14 29 0 17 Nov 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape Devansh Bisla Jing Wang A. Choromańska 27 34 0 20 Jan 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,833 0 17 Sep 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks Tong He Zhi-Li Zhang Hang Zhang Zhongyue Zhang Junyuan Xie Mu Li 224 1,400 0 04 Dec 2018