Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.08741
Cited By
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Train longer, generalize better: closing the generalization gap in large batch training of neural networks"
50 / 145 papers shown
Title
Gradient Descent as a Shrinkage Operator for Spectral Bias
Simon Lucey
38
0
0
25 Apr 2025
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
46
5
0
25 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
147
0
0
30 Dec 2024
Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
Kento Nozawa
Takashi Masuko
Toru Taniguchi
43
1
0
15 Aug 2024
Preserving Multilingual Quality While Tuning Query Encoder on English Only
Oleg V. Vasilyev
Randy Sawaya
John Bohannon
35
1
0
01 Jul 2024
Spreeze: High-Throughput Parallel Reinforcement Learning Framework
Jing Hou
Guang Chen
Ruiqi Zhang
Zhijun Li
Shangding Gu
Changjun Jiang
OffRL
26
2
0
11 Dec 2023
BCN: Batch Channel Normalization for Image Classification
Afifa Khaled
Chao Li
Jia Ning
Kun He
15
6
0
01 Dec 2023
LEOD: Label-Efficient Object Detection for Event Cameras
Ziyi Wu
Mathias Gehrig
Qing Lyu
Xudong Liu
Igor Gilitschenski
27
13
0
29 Nov 2023
Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective
Yifei Wang
Liangchen Li
Jiansheng Yang
Zhouchen Lin
Yisen Wang
31
11
0
30 Oct 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
15
0
0
05 Aug 2023
Eva: A General Vectorized Approximation Framework for Second-order Optimization
Lin Zhang
S. Shi
Bo-wen Li
28
1
0
04 Aug 2023
Addressing caveats of neural persistence with deep graph persistence
Leander Girrbach
Anders Christensen
Ole Winther
Zeynep Akata
A. Sophia Koepke
GNN
25
1
0
20 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
S. Tyagi
Martin Swany
FedML
32
3
0
16 Jul 2023
A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks
Vignesh Kothapalli
Tom Tirer
Joan Bruna
36
10
0
04 Jul 2023
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
33
2
0
18 Jun 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
16
3
0
08 Jun 2023
Ghost Noise for Regularizing Deep Neural Networks
Atli Kosson
Dongyang Fan
Martin Jaggi
19
1
0
26 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedML
AAML
19
0
0
23 May 2023
GeNAS: Neural Architecture Search with Better Generalization
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
30
4
0
15 May 2023
Do deep neural networks have an inbuilt Occam's razor?
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCV
BDL
21
16
0
13 Apr 2023
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
Kfir Y. Levy
Kfir Y. Levy
FedML
45
2
0
09 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
72
5
0
22 Mar 2023
How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy
Natalia Ponomareva
Hussein Hazimeh
Alexey Kurakin
Zheng Xu
Carson E. Denison
H. B. McMahan
Sergei Vassilvitskii
Steve Chien
Abhradeep Thakurta
94
167
0
01 Mar 2023
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task Learning
Caoyun Fan
Wenqing Chen
Jidong Tian
Yitian Li
Hao He
Yaohui Jin
14
2
0
18 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
M. Wyart
40
6
0
31 Jan 2023
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Axel Sauer
Tero Karras
S. Laine
Andreas Geiger
Timo Aila
37
207
0
23 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
39
12
0
16 Jan 2023
FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient Federated Learning
Young Geun Kim
Carole-Jean Wu
FedML
19
5
0
30 Nov 2022
ModelDiff: A Framework for Comparing Learning Algorithms
Harshay Shah
Sung Min Park
Andrew Ilyas
A. Madry
SyDa
51
26
0
22 Nov 2022
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
Alexander Nikulin
Vladislav Kurenkov
Denis Tarasov
Dmitry Akimov
Sergey Kolesnikov
OffRL
31
14
0
20 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
27
10
0
19 Nov 2022
Perturbation Analysis of Neural Collapse
Tom Tirer
Haoxiang Huang
Jonathan Niles-Weed
AAML
35
23
0
29 Oct 2022
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
32
11
0
21 Oct 2022
MSRL: Distributed Reinforcement Learning with Dataflow Fragments
Huanzhou Zhu
Bo Zhao
Gang Chen
Weifeng Chen
Yijie Chen
Liang Shi
Yaodong Yang
Peter R. Pietzuch
Lei Chen
OffRL
MoE
16
6
0
03 Oct 2022
Why neural networks find simple solutions: the many regularizers of geometric complexity
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
55
30
0
27 Sep 2022
Rethinking Performance Gains in Image Dehazing Networks
Yuda Song
Yang Zhou
Hui Qian
Xin Du
SSeg
33
48
0
23 Sep 2022
Batch Layer Normalization, A new normalization layer for CNNs and RNN
A. Ziaee
Erion cCano
16
12
0
19 Sep 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
27
8
0
19 Jul 2022
Efficient Augmentation for Imbalanced Deep Learning
Damien Dablain
C. Bellinger
Bartosz Krawczyk
Nitesh V. Chawla
30
7
0
13 Jul 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
S. Shi
Wei Wang
Bo-wen Li
36
10
0
30 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
40
69
0
14 Jun 2022
Towards Understanding Sharpness-Aware Minimization
Maksym Andriushchenko
Nicolas Flammarion
AAML
32
133
0
13 Jun 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
47
36
0
01 Jun 2022
Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD
Konstantinos E. Nikolakakis
Farzin Haddadpour
Amin Karbasi
Dionysios S. Kalogerias
40
17
0
26 Apr 2022
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Zangwei Zheng
Peng Xu
Xuan Zou
Da Tang
Zhen Li
...
Xiangzhuo Ding
Fuzhao Xue
Ziheng Qing
Youlong Cheng
Yang You
VLM
44
7
0
13 Apr 2022
Small Batch Sizes Improve Training of Low-Resource Neural MT
Àlex R. Atrio
Andrei Popescu-Belis
32
6
0
20 Mar 2022
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis
Dominik Rivoir
Isabel Funke
Stefanie Speidel
24
16
0
15 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
148
0
07 Mar 2022
Regularising for invariance to data augmentation improves supervised learning
Aleksander Botev
Matthias Bauer
Soham De
32
14
0
07 Mar 2022
1
2
3
Next