Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00489
Cited By
v1
v2 (latest)
Don't Decay the Learning Rate, Increase the Batch Size
1 November 2017
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Don't Decay the Learning Rate, Increase the Batch Size"
50 / 454 papers shown
Title
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Junhan Kim
Kyungphil Park
Chungman Lee
Ho-Young Kim
Joonyoung Kim
Yongkweon Jeon
MQ
108
3
0
14 Feb 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
88
2
0
07 Feb 2024
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Yongchang Hao
Yanshuai Cao
Lili Mou
98
55
0
05 Feb 2024
Glocal Hypergradient Estimation with Koopman Operator
Ryuichiro Hataya
Yoshinobu Kawahara
114
2
0
05 Feb 2024
Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
72
6
0
21 Jan 2024
FourCastNeXt: Optimizing FourCastNet Training for Limited Compute
Edison Guo
Maruf Ahmed
Yue Sun
Rui Yang
Harrison Cook
Tennessee Leeuwenburg
Ben Evans
45
1
0
10 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
209
382
0
05 Jan 2024
On the Role of Server Momentum in Federated Learning
Jianhui Sun
Xidong Wu
Heng-Chiao Huang
Aidong Zhang
FedML
118
11
0
19 Dec 2023
Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
Yefan Zhou
Tianyu Pang
Keqin Liu
Charles H. Martin
Michael W. Mahoney
Yaoqing Yang
145
12
0
01 Dec 2023
Who is leading in AI? An analysis of industry AI research
Ben Cottier
T. Besiroglu
David Owen
126
8
0
24 Nov 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Naoki Sato
Hideaki Iiduka
84
3
0
15 Nov 2023
Refining the ONCE Benchmark with Hyperparameter Tuning
Maksim Golyadkin
Alexander Gambashidze
Ildar Nurgaliev
Ilya Makarov
67
1
0
10 Nov 2023
Prediction of Effective Elastic Moduli of Rocks using Graph Neural Networks
Jaehong Chung
R. Ahmad
WaiChing Sun
Wei Cai
T. Mukerji
66
9
0
30 Oct 2023
On the accuracy and efficiency of group-wise clipping in differentially private optimization
Zhiqi Bu
Ruixuan Liu
Yu Wang
Sheng Zha
George Karypis
VLM
68
4
0
30 Oct 2023
Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens
Ross M. Clarke
José Miguel Hernández-Lobato
128
2
0
23 Oct 2023
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
Adam Block
Dylan J. Foster
Akshay Krishnamurthy
Max Simchowitz
Cyril Zhang
88
7
0
17 Oct 2023
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent
Zhao Song
Chiwun Yang
90
10
0
17 Oct 2023
Defect Analysis of 3D Printed Cylinder Object Using Transfer Learning Approaches
M. Ahsan
Shivakumar Raman
Zahed Siddique
67
1
0
12 Oct 2023
A Convolutional Network Adaptation for Cortical Classification During Mobile Brain Imaging
B. Cichy
J. Lukos
Mohammad Alam
J. C. Bradford
Nicholas Wymbs
44
0
0
11 Oct 2023
Scaling Laws for Associative Memories
Vivien A. Cabannes
Elvis Dohmatob
A. Bietti
142
21
0
04 Oct 2023
Coupling public and private gradient provably helps optimization
Ruixuan Liu
Zhiqi Bu
Yu Wang
Sheng Zha
George Karypis
80
2
0
02 Oct 2023
Modularity in Deep Learning: A Survey
Haozhe Sun
Isabelle Guyon
MoMe
110
3
0
02 Oct 2023
Joint Sampling and Optimisation for Inverse Rendering
Martin Balint
K. Myszkowski
Hans-Peter Seidel
Gurprit Singh
25
1
0
27 Sep 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
58
2
0
24 Sep 2023
Invisible Watermarking for Audio Generation Diffusion Models
Xirong Cao
Xia Li
D. Jadav
Yanzhao Wu
Zhehui Chen
Chen Zeng
Wenqi Wei
WIGM
85
9
0
22 Sep 2023
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
91
9
0
20 Sep 2023
Monitoring of Urban Changes with multi-modal Sentinel 1 and 2 Data in Mariupol, Ukraine, in 2022/23
Georg Zitzlsberger
M. Podhorányi
68
0
0
11 Aug 2023
Performance Analysis of Transformer Based Models (BERT, ALBERT and RoBERTa) in Fake News Detection
Shafna Fitria Nur Azizah
Hasan Dwi Cahyono
S. W. Sihwi
Wisnu Widiarto
31
13
0
09 Aug 2023
Conditioning Generative Latent Optimization for Sparse-View CT Image Reconstruction
Thomas Braure
Delphine Lazaro
David Hateau
Vincent Brandon
Kévin Ginsburger
MedIm
73
0
0
31 Jul 2023
How to Scale Your EMA
Dan Busbridge
Jason Ramapuram
Pierre Ablin
Tatiana Likhomanenko
Eeshan Gunesh Dhekane
Xavier Suau
Russ Webb
82
19
0
25 Jul 2023
Batching for Green AI -- An Exploratory Study on Inference
Tim Yarally
Luís Cruz
Daniel Feitosa
June Sallou
A. V. Deursen
59
5
0
21 Jul 2023
Efficient Convolution and Transformer-Based Network for Video Frame Interpolation
Issa Khalifeh
L. Murn
M. Mrak
E. Izquierdo
ViT
81
2
0
12 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
129
45
0
12 Jul 2023
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks
Mor Shpigel Nacson
Rotem Mulayoff
Greg Ongie
T. Michaeli
Daniel Soudry
86
13
0
30 Jun 2023
Separable Physics-Informed Neural Networks
Junwoo Cho
Seungtae Nam
Hyunmo Yang
S. Yun
Youngjoon Hong
Eunbyung Park
PINN
AI4CE
88
48
0
28 Jun 2023
Deep Huber quantile regression networks
Hristos Tyralis
Georgia Papacharalampous
N. Dogulu
Kwok-Pan Chun
UQCV
145
2
0
17 Jun 2023
When and Why Momentum Accelerates SGD:An Empirical Study
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
71
10
0
15 Jun 2023
Temporal Gradient Inversion Attacks with Robust Optimization
Bowen Li Jie Li
Hanlin Gu
Ruoxin Chen
Jie Li
Chentao Wu
Na Ruan
Xueming Si
Lixin Fan
AAML
76
2
0
13 Jun 2023
Enhance Diffusion to Improve Robust Generalization
Jianhui Sun
Sanchit Sinha
Aidong Zhang
79
4
0
05 Jun 2023
Intelligent gradient amplification for deep neural networks
S. Basodi
K. Pusuluri
Xueli Xiao
Yi Pan
ODL
40
1
0
29 May 2023
Semantic segmentation of sparse irregular point clouds for leaf/wood discrimination
Yuchen Bai
Jean-Baptiste Durand
Grégoire Laurent Vincent
F. Forbes
3DPC
31
8
0
26 May 2023
Dynamic Masking Rate Schedules for MLM Pretraining
Zachary Ankner
Naomi Saphra
Davis W. Blalock
Jonathan Frankle
Matthew L. Leavitt
101
8
0
24 May 2023
Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching
S. Tyagi
Prateek Sharma
86
22
0
20 May 2023
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks
Raffaele Marino
F. Ricci-Tersenghi
114
15
0
10 May 2023
Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load
Maximilian Egger
Serge Kas Hanna
Rawad Bitar
FedML
71
1
0
17 Apr 2023
Deep neural networks have an inbuilt Occam's razor
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCV
BDL
87
16
0
13 Apr 2023
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
Kfir Y. Levy
Kfir Y. Levy
FedML
103
3
0
09 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
105
43
0
07 Apr 2023
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
113
6
0
06 Apr 2023
Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability
Haoyi Xiong
Xuhong Li
Bo Yu
Zhanxing Zhu
Dongrui Wu
Dejing Dou
NoLa
62
0
0
01 Apr 2023
Previous
1
2
3
4
5
...
8
9
10
Next