Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.11286
Cited By
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
27 May 2019
Boris Ginsburg
P. Castonguay
Oleksii Hrinchuk
Oleksii Kuchaiev
Vitaly Lavrukhin
Ryan Leary
Jason Chun Lok Li
Huyen Nguyen
Yang Zhang
Jonathan M. Cohen
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks"
4 / 4 papers shown
Title
How to Fine-Tune Vision Models with SGD
Ananya Kumar
Ruoqi Shen
Sébastien Bubeck
Suriya Gunasekar
VLM
14
29
0
17 Nov 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
27
34
0
20 Jan 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
224
1,400
0
04 Dec 2018
1