Improving Generalization Performance by Switching from Adam to SGD

20 December 2017

Papers citing "Improving Generalization Performance by Switching from Adam to SGD"

50 / 181 papers shown

Title
Multi-Span Optical Power Spectrum Evolution Modeling using ML-based Multi-Decoder Attention Framework A. Raj Zehao Wang F. Slyne Tingjun Chen D. Kilper Marco Ruffini 32 0 0 21 Mar 2025
High-entropy Advantage in Neural Networks' Generalizability Entao Yang Xuzhi Zhang Yue Shang Ge Zhang AI4CE 66 0 0 17 Mar 2025
FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic Optimization Zhanhong Jiang Md Zahid Hasan Aditya Balu Joshua R. Waite Genyi Huang S. Sarkar 52 0 0 06 Mar 2025
Online Prototypes and Class-Wise Hypergradients for Online Continual Learning with Pre-Trained Models Nicolas Michel Maorong Wang Jiangpeng He Toshihiko Yamasaki CLL 59 0 0 26 Feb 2025
Towards Mitigating Architecture Overfitting on Distilled Datasets Xuyang Zhong Chen Liu DD 55 0 0 08 Jan 2025
A Method for Enhancing Generalization of Adam by Multiple Integrations Long Jin Han Nong Liangming Chen Zhenming Su 70 0 0 17 Dec 2024
Adapter-Enhanced Semantic Prompting for Continual Learning Baocai Yin Ji Zhao Huajie Jiang Ningning Hou Yongli Hu Amin Beheshti Ming-Hsuan Yang Yuankai Qi CLL VLM 102 0 0 15 Dec 2024
Conformal Symplectic Optimization for Stable Reinforcement Learning Yao Lyu Xiangteng Zhang Shengbo Eben Li Jingliang Duan Letian Tao Qing Xu Lei He Keqiang Li 68 0 0 03 Dec 2024
Selfish Evolution: Making Discoveries in Extreme Label Noise with the Help of Overfitting Dynamics Nima Sedaghat Tanawan Chatchadanoraset Colin Orion Chandler Ashish Mahabal Maryam Eslami NoLa 91 0 0 26 Nov 2024
A Performance Increment Strategy for Semantic Segmentation of Low-Resolution Images from Damaged Roads Rafael S. Toledo Cristiano S. Oliveira Vitor H. T. Oliveira Eric A. Antonelo Aldo von Wangenheim 62 0 0 25 Nov 2024
Active Learning for Vision-Language Models Bardia Safaei Vishal M. Patel VLM 47 2 0 29 Oct 2024
Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach Firas Bayram Bestoun S. Ahmed OOD 34 0 0 28 Oct 2024
Understanding Adam Requires Better Rotation Dependent Assumptions Lucas Maes Tianyue H. Zhang Alexia Jolicoeur-Martineau Ioannis Mitliagkas Damien Scieur Simon Lacoste-Julien Charles Guille-Escuret 38 3 0 25 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts R. Teo Tan M. Nguyen MoE 33 3 0 18 Oct 2024
Feasibility Analysis of Federated Neural Networks for Explainable Detection of Atrial Fibrillation Diogo Reis Santos Andrea Protani Lorenzo Giusti Albert Sund Aillet Pierpaolo Brutti Luigi Serio FedML 16 0 0 14 Oct 2024
Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines Edward Milsom Ben Anson Laurence Aitchison 28 0 0 08 Oct 2024
WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency Pranav Jeevan Neeraj Nixon Amit Sethi SupR 21 0 0 16 Sep 2024
Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training Yuhan Ma Dan Sun Erdi Gao Ningjing Sang Iris Li Guanming Huang 28 7 0 07 Sep 2024
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning Dennis Chemnitz Maximilian Engel 35 0 0 29 Jul 2024
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction Yansheng Li Tingzhu Wang Kang Wu Linlin Wang Xin Guo Wenbin Wang 60 0 0 27 Jul 2024
Large Kernel Distillation Network for Efficient Single Image Super-Resolution Chengxing Xie Xiaoming Zhang Linze Li Haiteng Meng Tianlin Zhang Tian-Ping Li Xiaole Zhao SupR 23 26 0 19 Jul 2024
HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion Junhao Su Chenghao He Feiyu Zhu Xiaojie Xu Dongzhi Guan Chenyang Si 53 2 0 08 Jul 2024
MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network Yuming Zhang Shouxin Zhang Peizhe Wang Feiyu Zhu Dongzhi Guan Junhao Su Jiabin Liu Changpeng Cai 33 2 0 24 Jun 2024
Variational Stochastic Gradient Descent for Deep Neural Networks Haotian Chen Anna Kuzina Babak Esmaeili Jakub M. Tomczak 52 0 0 09 Apr 2024
Dynamic Memory Based Adaptive Optimization Balázs Szegedy Domonkos Czifra Péter Korösi-Szabó ODL 32 0 0 23 Feb 2024
Enhancing Power Quality Event Classification with AI Transformer Models A. M. Saber Amr Youssef D. Svetinovic H. H. Zeineldin Deepa Kundur Ehab El-Saadany 20 2 0 22 Feb 2024
SEBERTNets: Sequence Enhanced BERT Networks for Event Entity Extraction Tasks Oriented to the Finance Field Congqing He Xiangyu Zhu Yuquan Le Yuzhong Liu Jianhong Yin 13 1 0 21 Jan 2024
One Step Learning, One Step Review Xiaolong Huang Qiankun Li Xueran Li Xuesong Gao 33 1 0 19 Jan 2024
AdamL: A fast adaptive gradient method incorporating loss function Lu Xia Stefano Massei ODL 40 3 0 23 Dec 2023
Accelerating Neural Network Training: A Brief Review Sahil Nokhwal Priyanka Chilakalapudi Preeti Donekal Suman Nokhwal Saurabh Pahune Ankit Chaudhary 14 8 0 15 Dec 2023
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix Yun Yue Zhiling Ye Jiadi Jiang Yongchao Liu Ke Zhang ODL 24 1 0 04 Dec 2023
A Comprehensive Study of Vision Transformers in Image Classification Tasks Mahmoud Khalil Ahmad Khalil A. Ngom ViT 21 8 0 02 Dec 2023
Signal Processing Meets SGD: From Momentum to Filter Zhipeng Yao Guisong Chang Jiaqi Zhang Qi Zhang Dazhou Li Yu Zhang ODL 31 0 0 06 Nov 2023
Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization Philipp Dahlinger P. Becker Maximilian Hüttenrauch Gerhard Neumann 15 0 0 31 Oct 2023
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization Siddharth Singh Zack Sating A. Bhatele ODL 38 0 0 18 Oct 2023
Adam-family Methods with Decoupled Weight Decay in Deep Learning Kuang-Yu Ding Nachuan Xiao Kim-Chuan Toh 21 3 0 13 Oct 2023
Asymmetric Momentum: A Rethinking of Gradient Descent Gongyue Zhang Dinghuang Zhang Shuwen Zhao Donghan Liu Carrie M. Toptan Honghai Liu ODL 14 1 0 05 Sep 2023
On the Implicit Bias of Adam M. D. Cattaneo Jason M. Klusowski Boris Shigida 36 17 0 31 Aug 2023
We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond A. Khadangi ODL 13 0 0 21 Aug 2023
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity Charlie Hou K. K. Thekumparampil Michael Shavlovsky Giulia Fanti Yesh Dattatreya Sujay Sanghavi LMTD 21 1 0 31 Jul 2023
Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis Conor Hassan Roberto Salomone Kerrie Mengersen 23 6 0 28 Jul 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers Yineng Chen Z. Li Lefei Zhang Bo Du Hai Zhao 33 4 0 02 Jul 2023
WaveMixSR: A Resource-efficient Neural Network for Image Super-resolution Pranav Jeevan Akella Srinidhi Pasunuri Prathiba A. Sethi SupR 31 9 0 01 Jul 2023
WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting Pranav Jeevan Dharshan Sampath Kumar Amit Sethi 25 6 0 01 Jul 2023
Semantic Segmentation of Porosity in 4D Spatio-Temporal X-ray μCT of Titanium Coated Ni wires using Deep Learning Pradyumna Elavarthi Arun J. Bhattacharjee A. P. Y. Puente Anca L. Ralescu 22 0 0 24 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning Libin Zhu Chaoyue Liu Adityanarayanan Radhakrishnan M. Belkin 32 14 0 07 Jun 2023
Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning Achraf Bahamou D. Goldfarb ODL 36 0 0 23 May 2023
Mathematical Challenges in Deep Learning V. Nia Guojun Zhang I. Kobyzev Michael R. Metel Xinlin Li ... S. Hemati M. Asgharian Linglong Kong Wulong Liu Boxing Chen AI4CE VLM 37 1 0 24 Mar 2023
Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks D. Pasechnyuk Anton Prazdnichnykh Mikhail Evtikhiev T. Bryksin 32 1 0 06 Mar 2023
Learning to Generalize Provably in Learning to Optimize Junjie Yang Tianlong Chen Mingkang Zhu Fengxiang He Dacheng Tao Yitao Liang Zhangyang Wang 31 6 0 22 Feb 2023