Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.09871
Cited By
A Theory on Adam Instability in Large-Scale Machine Learning
19 April 2023
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
Naman Goyal
Punit Singh Koura
Sharan Narang
Andrew Poulton
Ruan Silva
Binh Tang
Diana Liskovich
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Theory on Adam Instability in Large-Scale Machine Learning"
24 / 24 papers shown
Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
132
0
0
26 Apr 2025
Numerical Error Analysis of Large Language Models
Stanislav Budzinskiy
Wenyi Fang
Longbin Zeng
Philipp Petersen
47
1
0
13 Mar 2025
Stochastic Rounding for LLM Training: Theory and Practice
Kaan Ozkara
Tao Yu
Youngsuk Park
43
0
0
27 Feb 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu (Allen) Zhang
Gaojie Jin
Xianrui Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
39
0
0
24 Feb 2025
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Sizhuang He
Ananyae Kumar Bhartari
Bowen Li
P. Perdikaris
PINN
56
4
0
02 Feb 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
44
1
0
12 Jan 2025
Beyond Normal: Learning Spatial Density Models of Node Mobility
Wanxin Gao
Ioanis Nikolaidis
Janelle Harms
18
0
0
17 Nov 2024
Methods of improving LLM training stability
Oleg Rybakov
Mike Chrzanowski
Peter Dykas
Jinze Xue
Ben Lanir
26
1
0
22 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
33
3
0
18 Oct 2024
Scaling Laws For Diffusion Transformers
Zhengyang Liang
Hao He
Ceyuan Yang
Bo Dai
27
9
0
10 Oct 2024
Geometrical structures of digital fluctuations in parameter space of neural networks trained with adaptive momentum optimization
Igor V. Netay
39
0
0
22 Aug 2024
Anytime-Valid Inference for Double/Debiased Machine Learning of Causal Parameters
Abhinandan Dalal
Patrick Blobaum
S. Kasiviswanathan
Aaditya Ramdas
AI4CE
25
0
0
18 Aug 2024
Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology
Eric Zimmermann
Eugene Vorontsov
Julian Viret
Adam Casson
Michal Zelechowski
...
Razik Yousfi
Thomas J. Fuchs
Nicolò Fusi
Siqi Liu
Kristen Severson
MedIm
33
29
0
01 Aug 2024
GEB-1.3B: Open Lightweight Large Language Model
Jie Wu
Yufeng Zhu
Lei Shen
Xuqing Lu
ALM
29
0
0
14 Jun 2024
Is Flash Attention Stable?
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
29
5
0
05 May 2024
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
37
43
0
26 Feb 2024
Jointly Training Large Autoregressive Multimodal Models
Emanuele Aiello
L. Yu
Yixin Nie
Armen Aghajanyan
Barlas Oğuz
19
29
0
27 Sep 2023
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
35
81
0
25 Sep 2023
XGen-7B Technical Report
Erik Nijkamp
Tian Xie
Hiroaki Hayashi
Bo Pang
Congying Xia
...
Chien-Sheng Wu
Silvio Savarese
Yingbo Zhou
Shafiq R. Joty
Caiming Xiong
ALM
26
13
0
07 Sep 2023
On the Implicit Bias of Adam
M. D. Cattaneo
Jason M. Klusowski
Boris Shigida
31
17
0
31 Aug 2023
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant
Xianbiao Qi
Jianan Wang
Lei Zhang
15
0
0
15 Jun 2023
ReContrast: Domain-Specific Anomaly Detection via Contrastive Reconstruction
Jia Guo
Shuai Lu
Lize Jia
Weihang Zhang
Huiqi Li
21
23
0
05 Jun 2023
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
33
0
0
25 May 2023
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
24
38
0
25 Apr 2023
1