ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.03600
  4. Cited By
Measuring the Effects of Data Parallelism on Neural Network Training

Measuring the Effects of Data Parallelism on Neural Network Training

8 November 2018
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
ArXivPDFHTML

Papers citing "Measuring the Effects of Data Parallelism on Neural Network Training"

50 / 100 papers shown
Title
Training and Inference Efficiency of Encoder-Decoder Speech Models
Training and Inference Efficiency of Encoder-Decoder Speech Models
Piotr .Zelasko
Kunal Dhawan
Daniel Galvez
Krishna C. Puvvada
Ankita Pasad
Nithin Rao Koluguri
Ke Hu
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
45
0
0
07 Mar 2025
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
Hikaru Umeda
Hideaki Iiduka
67
2
0
17 Feb 2025
On the use of neural networks for the structural characterization of polymeric porous materials
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
46
5
0
25 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
36
3
0
10 Jan 2025
iServe: An Intent-based Serving System for LLMs
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
179
0
0
08 Jan 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
124
2
0
03 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
147
0
0
30 Dec 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
77
8
0
29 Oct 2024
Deconstructing What Makes a Good Optimizer for Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
50
17
0
10 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
60
20
0
27 Jun 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Atish Agarwala
Jeffrey Pennington
41
3
0
30 Apr 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
43
2
0
07 Feb 2024
In Search of a Data Transformation That Accelerates Neural Field
  Training
In Search of a Data Transformation That Accelerates Neural Field Training
Junwon Seo
Sangyoon Lee
Kwang In Kim
Jaeho Lee
44
3
0
28 Nov 2023
Generalization Bounds for Label Noise Stochastic Gradient Descent
Generalization Bounds for Label Noise Stochastic Gradient Descent
Jung Eun Huh
Patrick Rebeschini
13
1
0
01 Nov 2023
Taming Resource Heterogeneity In Distributed ML Training With Dynamic
  Batching
Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching
S. Tyagi
Prateek Sharma
18
22
0
20 May 2023
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample
  Complexity for Learning Single Index Models
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models
Alexandru Damian
Eshaan Nichani
Rong Ge
Jason D. Lee
MLT
42
33
0
18 May 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Bin Cui
36
11
0
17 May 2023
Scaling Expert Language Models with Unsupervised Domain Discovery
Scaling Expert Language Models with Unsupervised Domain Discovery
Suchin Gururangan
Margaret Li
M. Lewis
Weijia Shi
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoE
25
46
0
24 Mar 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
M. Wyart
40
6
0
31 Jan 2023
RAMP: A Flat Nanosecond Optical Network and MPI Operations for
  Distributed Deep Learning Systems
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Alessandro Ottino
Joshua L. Benjamin
G. Zervas
30
7
0
28 Nov 2022
VeLO: Training Versatile Learned Optimizers by Scaling Up
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
32
60
0
17 Nov 2022
Breadth-First Pipeline Parallelism
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNN
MoE
AI4CE
28
1
0
11 Nov 2022
Neuro-symbolic partial differential equation solver
Neuro-symbolic partial differential equation solver
Pouria A. Mistani
Samira Pakravan
Rajesh Ilango
S. Choudhry
Frédéric Gibou
31
1
0
25 Oct 2022
JAX-DIPS: Neural bootstrapping of finite discretization methods and
  application to elliptic problems with discontinuities
JAX-DIPS: Neural bootstrapping of finite discretization methods and application to elliptic problems with discontinuities
Pouria A. Mistani
Samira Pakravan
Rajesh Ilango
Frédéric Gibou
16
8
0
25 Oct 2022
A New Perspective for Understanding Generalization Gap of Deep Neural
  Networks Trained with Large Batch Sizes
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
32
11
0
21 Oct 2022
Large-batch Optimization for Dense Visual Predictions
Large-batch Optimization for Dense Visual Predictions
Zeyue Xue
Jianming Liang
Guanglu Song
Zhuofan Zong
Liang Chen
Yu Liu
Ping Luo
VLM
39
9
0
20 Oct 2022
On the Factory Floor: ML Engineering for Industrial-Scale Ads
  Recommendation Models
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models
Rohan Anil
S. Gadanho
Danya Huang
Nijith Jacob
Zhuoshu Li
...
Cristina Pop
Kevin Regan
G. Shamir
Rakesh Shivanna
Qiqi Yan
3DV
26
41
0
12 Sep 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of
  Deep Learning Optimizer using Hyperparameters Close to One
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One
Hideaki Iiduka
ODL
38
4
0
21 Aug 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech
  Recognition at Production Scale
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
27
8
0
19 Jul 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed
  Preconditioning
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
S. Shi
Wei Wang
Bo-wen Li
36
10
0
30 Jun 2022
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient
  Training of Deep Learning Models
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models
Yunzhuo Liu
Bo Jiang
Tian Guo
Zimeng Huang
Wen-ping Ma
Xinbing Wang
Chenghu Zhou
24
9
0
28 Apr 2022
Improving the Robustness of Federated Learning for Severely Imbalanced
  Datasets
Improving the Robustness of Federated Learning for Severely Imbalanced Datasets
Debasrita Chakraborty
Ashish Ghosh
FedML
21
2
0
28 Apr 2022
Pathways: Asynchronous Distributed Dataflow for ML
Pathways: Asynchronous Distributed Dataflow for ML
P. Barham
Aakanksha Chowdhery
J. Dean
Sanjay Ghemawat
Steven Hand
...
Parker Schuh
Ryan Sepassi
Laurent El Shafey
C. A. Thekkath
Yonghui Wu
GNN
MoE
45
126
0
23 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
148
0
07 Mar 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for
  Distributed Training Jobs
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
39
81
0
01 Feb 2022
Accelerate Model Parallel Training by Using Efficient Graph Traversal
  Order in Device Placement
Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement
Tianze Wang
A. H. Payberah
D. Hagos
Vladimir Vlassov
GNN
20
0
0
21 Jan 2022
COMET: A Novel Memory-Efficient Deep Learning Training Framework by
  Using Error-Bounded Lossy Compression
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression
Sian Jin
Chengming Zhang
Xintong Jiang
Yunhe Feng
Hui Guan
Guanpeng Li
Shuaiwen Leon Song
Dingwen Tao
27
23
0
18 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
27
14
0
01 Nov 2021
Trade-offs of Local SGD at Scale: An Empirical Study
Trade-offs of Local SGD at Scale: An Empirical Study
Jose Javier Gonzalez Ortiz
Jonathan Frankle
Michael G. Rabbat
Ari S. Morcos
Nicolas Ballas
FedML
40
19
0
15 Oct 2021
A Loss Curvature Perspective on Training Instability in Deep Learning
A Loss Curvature Perspective on Training Instability in Deep Learning
Justin Gilmer
Behrooz Ghorbani
Ankush Garg
Sneha Kudugunta
Behnam Neyshabur
David E. Cardoze
George E. Dahl
Zachary Nado
Orhan Firat
ODL
36
35
0
08 Oct 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
The Number of Steps Needed for Nonconvex Optimization of a Deep Learning
  Optimizer is a Rational Function of Batch Size
The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size
Hideaki Iiduka
20
1
0
26 Aug 2021
Shift-Curvature, SGD, and Generalization
Shift-Curvature, SGD, and Generalization
Arwen V. Bradley
C. Gomez-Uribe
Manish Reddy Vuyyuru
35
2
0
21 Aug 2021
Logit Attenuating Weight Normalization
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
37
1
0
12 Aug 2021
Large-Scale Differentially Private BERT
Large-Scale Differentially Private BERT
Rohan Anil
Badih Ghazi
Vineet Gupta
Ravi Kumar
Pasin Manurangsi
36
131
0
03 Aug 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
26
31
0
25 Jun 2021
Randomness In Neural Network Training: Characterizing The Impact of
  Tooling
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Donglin Zhuang
Xingyao Zhang
Shuaiwen Leon Song
Sara Hooker
25
75
0
22 Jun 2021
On Large-Cohort Training for Federated Learning
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
21
113
0
15 Jun 2021
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep
  Learning
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
Minghan Yang
Dong Xu
Qiwen Cui
Zaiwen Wen
Pengxiang Xu
13
4
0
14 Jun 2021
Federated Learning with Buffered Asynchronous Aggregation
Federated Learning with Buffered Asynchronous Aggregation
John Nguyen
Kshitiz Malik
Hongyuan Zhan
Ashkan Yousefpour
Michael G. Rabbat
Mani Malek
Dzmitry Huba
FedML
33
288
0
11 Jun 2021
12
Next