ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.07628
  4. Cited By
Improving Generalization Performance by Switching from Adam to SGD

Improving Generalization Performance by Switching from Adam to SGD

20 December 2017
N. Keskar
R. Socher
    ODL
ArXivPDFHTML

Papers citing "Improving Generalization Performance by Switching from Adam to SGD"

50 / 181 papers shown
Title
Multi-Span Optical Power Spectrum Evolution Modeling using ML-based Multi-Decoder Attention Framework
Multi-Span Optical Power Spectrum Evolution Modeling using ML-based Multi-Decoder Attention Framework
A. Raj
Zehao Wang
F. Slyne
Tingjun Chen
D. Kilper
Marco Ruffini
32
0
0
21 Mar 2025
High-entropy Advantage in Neural Networks' Generalizability
High-entropy Advantage in Neural Networks' Generalizability
Entao Yang
Xuzhi Zhang
Yue Shang
Ge Zhang
AI4CE
66
0
0
17 Mar 2025
FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic Optimization
Zhanhong Jiang
Md Zahid Hasan
Aditya Balu
Joshua R. Waite
Genyi Huang
S. Sarkar
52
0
0
06 Mar 2025
Online Prototypes and Class-Wise Hypergradients for Online Continual Learning with Pre-Trained Models
Online Prototypes and Class-Wise Hypergradients for Online Continual Learning with Pre-Trained Models
Nicolas Michel
Maorong Wang
Jiangpeng He
Toshihiko Yamasaki
CLL
59
0
0
26 Feb 2025
Towards Mitigating Architecture Overfitting on Distilled Datasets
Towards Mitigating Architecture Overfitting on Distilled Datasets
Xuyang Zhong
Chen Liu
DD
55
0
0
08 Jan 2025
A Method for Enhancing Generalization of Adam by Multiple Integrations
A Method for Enhancing Generalization of Adam by Multiple Integrations
Long Jin
Han Nong
Liangming Chen
Zhenming Su
70
0
0
17 Dec 2024
Adapter-Enhanced Semantic Prompting for Continual Learning
Adapter-Enhanced Semantic Prompting for Continual Learning
Baocai Yin
Ji Zhao
Huajie Jiang
Ningning Hou
Yongli Hu
Amin Beheshti
Ming-Hsuan Yang
Yuankai Qi
CLL
VLM
102
0
0
15 Dec 2024
Conformal Symplectic Optimization for Stable Reinforcement Learning
Conformal Symplectic Optimization for Stable Reinforcement Learning
Yao Lyu
Xiangteng Zhang
Shengbo Eben Li
Jingliang Duan
Letian Tao
Qing Xu
Lei He
Keqiang Li
68
0
0
03 Dec 2024
Selfish Evolution: Making Discoveries in Extreme Label Noise with the
  Help of Overfitting Dynamics
Selfish Evolution: Making Discoveries in Extreme Label Noise with the Help of Overfitting Dynamics
Nima Sedaghat
Tanawan Chatchadanoraset
Colin Orion Chandler
Ashish Mahabal
Maryam Eslami
NoLa
91
0
0
26 Nov 2024
A Performance Increment Strategy for Semantic Segmentation of
  Low-Resolution Images from Damaged Roads
A Performance Increment Strategy for Semantic Segmentation of Low-Resolution Images from Damaged Roads
Rafael S. Toledo
Cristiano S. Oliveira
Vitor H. T. Oliveira
Eric A. Antonelo
Aldo von Wangenheim
62
0
0
25 Nov 2024
Active Learning for Vision-Language Models
Active Learning for Vision-Language Models
Bardia Safaei
Vishal M. Patel
VLM
47
2
0
29 Oct 2024
Towards Trustworthy Machine Learning in Production: An Overview of the
  Robustness in MLOps Approach
Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach
Firas Bayram
Bestoun S. Ahmed
OOD
34
0
0
28 Oct 2024
Understanding Adam Requires Better Rotation Dependent Assumptions
Understanding Adam Requires Better Rotation Dependent Assumptions
Lucas Maes
Tianyue H. Zhang
Alexia Jolicoeur-Martineau
Ioannis Mitliagkas
Damien Scieur
Simon Lacoste-Julien
Charles Guille-Escuret
38
3
0
25 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
33
3
0
18 Oct 2024
Feasibility Analysis of Federated Neural Networks for Explainable
  Detection of Atrial Fibrillation
Feasibility Analysis of Federated Neural Networks for Explainable Detection of Atrial Fibrillation
Diogo Reis Santos
Andrea Protani
Lorenzo Giusti
Albert Sund Aillet
Pierpaolo Brutti
Luigi Serio
FedML
16
0
0
14 Oct 2024
Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel
  Machines
Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines
Edward Milsom
Ben Anson
Laurence Aitchison
28
0
0
08 Oct 2024
WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency
WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency
Pranav Jeevan
Neeraj Nixon
Amit Sethi
SupR
21
0
0
16 Sep 2024
Enhancing Deep Learning with Optimized Gradient Descent: Bridging
  Numerical Methods and Neural Network Training
Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training
Yuhan Ma
Dan Sun
Erdi Gao
Ningjing Sang
Iris Li
Guanming Huang
28
7
0
07 Sep 2024
Characterizing Dynamical Stability of Stochastic Gradient Descent in
  Overparameterized Learning
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Dennis Chemnitz
Maximilian Engel
35
0
0
29 Jul 2024
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
Yansheng Li
Tingzhu Wang
Kang Wu
Linlin Wang
Xin Guo
Wenbin Wang
60
0
0
27 Jul 2024
Large Kernel Distillation Network for Efficient Single Image
  Super-Resolution
Large Kernel Distillation Network for Efficient Single Image Super-Resolution
Chengxing Xie
Xiaoming Zhang
Linze Li
Haiteng Meng
Tianlin Zhang
Tian-Ping Li
Xiaole Zhao
SupR
23
26
0
19 Jul 2024
HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
Junhao Su
Chenghao He
Feiyu Zhu
Xiaojie Xu
Dongzhi Guan
Chenyang Si
53
2
0
08 Jul 2024
MLAAN: Scaling Supervised Local Learning with Multilaminar Leap
  Augmented Auxiliary Network
MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network
Yuming Zhang
Shouxin Zhang
Peizhe Wang
Feiyu Zhu
Dongzhi Guan
Junhao Su
Jiabin Liu
Changpeng Cai
33
2
0
24 Jun 2024
Variational Stochastic Gradient Descent for Deep Neural Networks
Variational Stochastic Gradient Descent for Deep Neural Networks
Haotian Chen
Anna Kuzina
Babak Esmaeili
Jakub M. Tomczak
52
0
0
09 Apr 2024
Dynamic Memory Based Adaptive Optimization
Dynamic Memory Based Adaptive Optimization
Balázs Szegedy
Domonkos Czifra
Péter Korösi-Szabó
ODL
32
0
0
23 Feb 2024
Enhancing Power Quality Event Classification with AI Transformer Models
Enhancing Power Quality Event Classification with AI Transformer Models
A. M. Saber
Amr Youssef
D. Svetinovic
H. H. Zeineldin
Deepa Kundur
Ehab El-Saadany
20
2
0
22 Feb 2024
SEBERTNets: Sequence Enhanced BERT Networks for Event Entity Extraction
  Tasks Oriented to the Finance Field
SEBERTNets: Sequence Enhanced BERT Networks for Event Entity Extraction Tasks Oriented to the Finance Field
Congqing He
Xiangyu Zhu
Yuquan Le
Yuzhong Liu
Jianhong Yin
13
1
0
21 Jan 2024
One Step Learning, One Step Review
One Step Learning, One Step Review
Xiaolong Huang
Qiankun Li
Xueran Li
Xuesong Gao
33
1
0
19 Jan 2024
AdamL: A fast adaptive gradient method incorporating loss function
AdamL: A fast adaptive gradient method incorporating loss function
Lu Xia
Stefano Massei
ODL
40
3
0
23 Dec 2023
Accelerating Neural Network Training: A Brief Review
Accelerating Neural Network Training: A Brief Review
Sahil Nokhwal
Priyanka Chilakalapudi
Preeti Donekal
Suman Nokhwal
Saurabh Pahune
Ankit Chaudhary
14
8
0
15 Dec 2023
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for
  Preconditioning Matrix
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix
Yun Yue
Zhiling Ye
Jiadi Jiang
Yongchao Liu
Ke Zhang
ODL
24
1
0
04 Dec 2023
A Comprehensive Study of Vision Transformers in Image Classification
  Tasks
A Comprehensive Study of Vision Transformers in Image Classification Tasks
Mahmoud Khalil
Ahmad Khalil
A. Ngom
ViT
21
8
0
02 Dec 2023
Signal Processing Meets SGD: From Momentum to Filter
Signal Processing Meets SGD: From Momentum to Filter
Zhipeng Yao
Guisong Chang
Jiaqi Zhang
Qi Zhang
Dazhou Li
Yu Zhang
ODL
31
0
0
06 Nov 2023
Information-Theoretic Trust Regions for Stochastic Gradient-Based
  Optimization
Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization
Philipp Dahlinger
P. Becker
Maximilian Hüttenrauch
Gerhard Neumann
15
0
0
31 Oct 2023
Jorge: Approximate Preconditioning for GPU-efficient Second-order
  Optimization
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization
Siddharth Singh
Zack Sating
A. Bhatele
ODL
38
0
0
18 Oct 2023
Adam-family Methods with Decoupled Weight Decay in Deep Learning
Adam-family Methods with Decoupled Weight Decay in Deep Learning
Kuang-Yu Ding
Nachuan Xiao
Kim-Chuan Toh
21
3
0
13 Oct 2023
Asymmetric Momentum: A Rethinking of Gradient Descent
Asymmetric Momentum: A Rethinking of Gradient Descent
Gongyue Zhang
Dinghuang Zhang
Shuwen Zhao
Donghan Liu
Carrie M. Toptan
Honghai Liu
ODL
14
1
0
05 Sep 2023
On the Implicit Bias of Adam
On the Implicit Bias of Adam
M. D. Cattaneo
Jason M. Klusowski
Boris Shigida
36
17
0
31 Aug 2023
We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual
  Learning Rate And Beyond
We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond
A. Khadangi
ODL
13
0
0
21 Aug 2023
Pretrained deep models outperform GBDTs in Learning-To-Rank under label
  scarcity
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity
Charlie Hou
K. K. Thekumparampil
Michael Shavlovsky
Giulia Fanti
Yesh Dattatreya
Sujay Sanghavi
LMTD
21
1
0
31 Jul 2023
Deep Generative Models, Synthetic Tabular Data, and Differential
  Privacy: An Overview and Synthesis
Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis
Conor Hassan
Roberto Salomone
Kerrie Mengersen
23
6
0
28 Jul 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to
  Adaptive and Non-adaptive Momentum Optimizers
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
33
4
0
02 Jul 2023
WaveMixSR: A Resource-efficient Neural Network for Image
  Super-resolution
WaveMixSR: A Resource-efficient Neural Network for Image Super-resolution
Pranav Jeevan
Akella Srinidhi
Pasunuri Prathiba
A. Sethi
SupR
31
9
0
01 Jul 2023
WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting
WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting
Pranav Jeevan
Dharshan Sampath Kumar
Amit Sethi
25
6
0
01 Jul 2023
Semantic Segmentation of Porosity in 4D Spatio-Temporal X-ray μCT of
  Titanium Coated Ni wires using Deep Learning
Semantic Segmentation of Porosity in 4D Spatio-Temporal X-ray μCT of Titanium Coated Ni wires using Deep Learning
Pradyumna Elavarthi
Arun J. Bhattacharjee
A. P. Y. Puente
Anca L. Ralescu
22
0
0
24 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on
  generalization through feature learning
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
32
14
0
07 Jun 2023
Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning
Achraf Bahamou
D. Goldfarb
ODL
36
0
0
23 May 2023
Mathematical Challenges in Deep Learning
Mathematical Challenges in Deep Learning
V. Nia
Guojun Zhang
I. Kobyzev
Michael R. Metel
Xinlin Li
...
S. Hemati
M. Asgharian
Linglong Kong
Wulong Liu
Boxing Chen
AI4CE
VLM
37
1
0
24 Mar 2023
Judging Adam: Studying the Performance of Optimization Methods on ML4SE
  Tasks
Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks
D. Pasechnyuk
Anton Prazdnichnykh
Mikhail Evtikhiev
T. Bryksin
32
1
0
06 Mar 2023
Learning to Generalize Provably in Learning to Optimize
Learning to Generalize Provably in Learning to Optimize
Junjie Yang
Tianlong Chen
Mingkang Zhu
Fengxiang He
Dacheng Tao
Yitao Liang
Zhangyang Wang
31
6
0
22 Feb 2023
1234
Next