Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network

21 March 2024

Papers citing "Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network"

5 / 5 papers shown

Title
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be Frederik Kunstner Jacques Chen J. Lavington Mark W. Schmidt 40 67 0 27 Apr 2023
Training Structured Neural Networks Through Manifold Identification and Variance Reduction Zih-Syuan Huang Ching-pei Lee AAML 48 9 0 05 Dec 2021
Dual Averaging is Surprisingly Effective for Deep Learning Optimization Samy Jelassi Aaron Defazio 28 4 0 20 Oct 2020
A Simple Convergence Proof of Adam and Adagrad Alexandre Défossez Léon Bottou Francis R. Bach Nicolas Usunier 56 143 0 05 Mar 2020
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 296 39,198 0 01 Sep 2014