ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08741
  4. Cited By
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
    ODL
ArXivPDFHTML

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 156 papers shown
Title
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A
  Study on Surgical Workflow Analysis
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis
Dominik Rivoir
Isabel Funke
Stefanie Speidel
24
17
0
15 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
148
0
07 Mar 2022
Regularising for invariance to data augmentation improves supervised
  learning
Regularising for invariance to data augmentation improves supervised learning
Aleksander Botev
Matthias Bauer
Soham De
32
14
0
07 Mar 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive
  DNN Models on Commodity Servers
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
N. Kim
19
19
0
02 Feb 2022
Memory-Efficient Backpropagation through Large Linear Layers
Memory-Efficient Backpropagation through Large Linear Layers
Daniel Bershatsky
A. Mikhalev
A. Katrutsa
Julia Gusak
D. Merkulov
Ivan Oseledets
19
4
0
31 Jan 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
9
0
31 Jan 2022
Toward Training at ImageNet Scale with Differential Privacy
Toward Training at ImageNet Scale with Differential Privacy
Alexey Kurakin
Shuang Song
Steve Chien
Roxana Geambasu
Andreas Terzis
Abhradeep Thakurta
36
100
0
28 Jan 2022
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient
  Descent
Non-Asymptotic Analysis of Online Multiplicative Stochastic Gradient Descent
Riddhiman Bhattacharya
Tiefeng Jiang
16
0
0
14 Dec 2021
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
Xiaowu Dai
Yuhua Zhu
27
4
0
02 Dec 2021
Hybrid BYOL-ViT: Efficient approach to deal with small datasets
Hybrid BYOL-ViT: Efficient approach to deal with small datasets
Safwen Naimi
Rien van Leeuwen
W. Souidène
S. B. Saoud
SSL
ViT
25
2
0
08 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary
  regime
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
34
4
0
07 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
30
14
0
01 Nov 2021
Trade-offs of Local SGD at Scale: An Empirical Study
Trade-offs of Local SGD at Scale: An Empirical Study
Jose Javier Gonzalez Ortiz
Jonathan Frankle
Michael G. Rabbat
Ari S. Morcos
Nicolas Ballas
FedML
43
19
0
15 Oct 2021
Spectral Bias in Practice: The Role of Function Frequency in
  Generalization
Spectral Bias in Practice: The Role of Function Frequency in Generalization
Sara Fridovich-Keil
Raphael Gontijo-Lopes
Rebecca Roelofs
41
28
0
06 Oct 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
How to Inject Backdoors with Better Consistency: Logit Anchoring on
  Clean Data
How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data
Zhiyuan Zhang
Lingjuan Lyu
Weiqiang Wang
Lichao Sun
Xu Sun
21
35
0
03 Sep 2021
Shift-Curvature, SGD, and Generalization
Shift-Curvature, SGD, and Generalization
Arwen V. Bradley
C. Gomez-Uribe
Manish Reddy Vuyyuru
35
2
0
21 Aug 2021
Logit Attenuating Weight Normalization
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
40
1
0
12 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep
  Learning Workloads in GPU Clusters
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
Chen Sun
Shenggui Li
Jinyue Wang
Jun Yu
54
47
0
08 Aug 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
  and Anomalous Diffusion
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
31
15
0
19 Jul 2021
Bag of Tricks for Neural Architecture Search
Bag of Tricks for Neural Architecture Search
T. Elsken
B. Staffler
Arber Zela
J. H. Metzen
Frank Hutter
27
5
0
08 Jul 2021
On Large-Cohort Training for Federated Learning
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
21
113
0
15 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
28
13
0
01 Jun 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu (Allen) Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
43
36
0
16 Apr 2021
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
  Improve Generalization
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
Zeke Xie
Li-xin Yuan
Zhanxing Zhu
Masashi Sugiyama
27
29
0
31 Mar 2021
On the Validity of Modeling SGD with Stochastic Differential Equations
  (SDEs)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Low Curvature Activations Reduce Overfitting in Adversarial Training
Low Curvature Activations Reduce Overfitting in Adversarial Training
Vasu Singla
Sahil Singla
David Jacobs
S. Feizi
AAML
32
45
0
15 Feb 2021
Straggler-Resilient Distributed Machine Learning with Dynamic Backup
  Workers
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers
Guojun Xiong
Gang Yan
Rahul Singh
Jian Li
28
12
0
11 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
A spin-glass model for the loss surfaces of generative adversarial
  networks
A spin-glass model for the loss surfaces of generative adversarial networks
Nicholas P. Baskerville
J. Keating
F. Mezzadri
J. Najnudel
GAN
30
12
0
07 Jan 2021
Learning from History for Byzantine Robust Optimization
Learning from History for Byzantine Robust Optimization
Sai Praneeth Karimireddy
Lie He
Martin Jaggi
FedML
AAML
30
173
0
18 Dec 2020
Data optimization for large batch distributed training of deep neural
  networks
Data optimization for large batch distributed training of deep neural networks
Shubhankar Gahlot
Junqi Yin
Mallikarjun Shankar
16
1
0
16 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient
  Descent
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
61
37
0
07 Dec 2020
EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation using
  Accelerated Neuroevolution with Weight Transfer
EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation using Accelerated Neuroevolution with Weight Transfer
William J. McNally
Kanav Vats
Alexander Wong
J. McPhee
3DH
30
16
0
17 Nov 2020
Regularizing Neural Networks via Adversarial Model Perturbation
Regularizing Neural Networks via Adversarial Model Perturbation
Yaowei Zheng
Richong Zhang
Yongyi Mao
AAML
30
95
0
10 Oct 2020
Improved generalization by noise enhancement
Improved generalization by noise enhancement
Takashi Mori
Masahito Ueda
24
3
0
28 Sep 2020
Relevance of Rotationally Equivariant Convolutions for Predicting
  Molecular Properties
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties
Benjamin Kurt Miller
Mario Geiger
Tess E. Smidt
Frank Noé
16
75
0
19 Aug 2020
BroadFace: Looking at Tens of Thousands of People at Once for Face
  Recognition
BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition
Y. Kim
Wonpyo Park
Jongju Shin
CVBM
27
51
0
15 Aug 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch
  Training
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
26
8
0
28 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
27
37
0
09 Jul 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory
  Approach to Neural Network Training
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
37
49
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
J. Lee
Tengyu Ma
29
93
0
15 Jun 2020
Learning Rate Annealing Can Provably Help Generalization, Even for
  Convex Problems
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
31
21
0
15 May 2020
Predicting the outputs of finite deep neural networks trained with noisy
  gradients
Predicting the outputs of finite deep neural networks trained with noisy gradients
Gadi Naveh
Oded Ben-David
H. Sompolinsky
Zohar Ringel
19
20
0
02 Apr 2020
AL2: Progressive Activation Loss for Learning General Representations in
  Classification Neural Networks
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks
Majed El Helou
Frederike Dumbgen
Sabine Süsstrunk
CLL
AI4CE
30
2
0
07 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function
  in Deep Networks
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
19
20
0
24 Feb 2020
The Two Regimes of Deep Network Training
The Two Regimes of Deep Network Training
Guillaume Leclerc
A. Madry
19
45
0
24 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient
  Descent Exponentially Favors Flat Minima
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
28
17
0
10 Feb 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that
  Generalizes Well
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
38
55
0
07 Jan 2020
Information-Theoretic Local Minima Characterization and Regularization
Information-Theoretic Local Minima Characterization and Regularization
Zhiwei Jia
Hao Su
27
19
0
19 Nov 2019
Previous
1234
Next