ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.13936
  4. Cited By
Communication-Efficient Adaptive Batch Size Strategies for Distributed
  Local Gradient Methods

Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

20 June 2024
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
ArXivPDFHTML

Papers citing "Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods"

19 / 19 papers shown
Title
Nemotron-4 15B Technical Report
Nemotron-4 15B Technical Report
Jupinder Parmar
Shrimai Prabhumoye
Joseph Jennings
M. Patwary
Sandeep Subramanian
...
Ashwath Aithal
Oleksii Kuchaiev
Mohammad Shoeybi
Jonathan Cohen
Bryan Catanzaro
44
22
0
26 Feb 2024
Asynchronous Local-SGD Training for Language Modeling
Asynchronous Local-SGD Training for Language Modeling
Bo Liu
Rachita Chhaparia
Arthur Douillard
Satyen Kale
Andrei A. Rusu
Jiajun Shen
Arthur Szlam
MarcÁurelio Ranzato
FedML
55
11
0
17 Jan 2024
Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays
Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays
Konstantin Mishchenko
Francis R. Bach
Mathieu Even
Blake E. Woodworth
51
59
0
15 Jun 2022
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
45
37
0
09 Jul 2020
Is Local SGD Better than Minibatch SGD?
Is Local SGD Better than Minibatch SGD?
Blake E. Woodworth
Kumar Kshitij Patel
Sebastian U. Stich
Zhen Dai
Brian Bullins
H. B. McMahan
Ohad Shamir
Nathan Srebro
FedML
50
254
0
18 Feb 2020
Better Theory for SGD in the Nonconvex World
Better Theory for SGD in the Nonconvex World
Ahmed Khaled
Peter Richtárik
33
182
0
09 Feb 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
231
42,038
0
03 Dec 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
392
24,160
0
26 Jul 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
161
991
0
01 Apr 2019
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
68
408
0
08 Nov 2018
Fast and Faster Convergence of SGD for Over-Parameterized Models and an
  Accelerated Perceptron
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
Sharan Vaswani
Francis R. Bach
Mark Schmidt
50
297
0
16 Oct 2018
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
Raul Puri
Robert M. Kirby
Nikolai Yakovenko
Bryan Catanzaro
48
29
0
03 Aug 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
150
1,056
0
24 May 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
48
289
0
18 Dec 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
93
990
0
01 Nov 2017
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
1.1K
39,383
0
01 Sep 2014
One weird trick for parallelizing convolutional neural networks
One weird trick for parallelizing convolutional neural networks
A. Krizhevsky
GNN
74
1,297
0
23 Apr 2014
A Proximal Stochastic Gradient Method with Progressive Variance
  Reduction
A Proximal Stochastic Gradient Method with Progressive Variance Reduction
Lin Xiao
Tong Zhang
ODL
138
738
0
19 Mar 2014
Hybrid Deterministic-Stochastic Methods for Data Fitting
Hybrid Deterministic-Stochastic Methods for Data Fitting
M. Friedlander
Mark Schmidt
124
387
0
13 Apr 2011
1