Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1812.06162
Cited By
An Empirical Model of Large-Batch Training
14 December 2018
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Empirical Model of Large-Batch Training"
23 / 73 papers shown
Title
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
29
8
0
04 Jun 2021
Wirelessly Powered Federated Edge Learning: Optimal Tradeoffs Between Convergence and Power Transfer
Qunsong Zeng
Yuqing Du
Kaibin Huang
37
35
0
24 Feb 2021
Improved Denoising Diffusion Probabilistic Models
Alex Nichol
Prafulla Dhariwal
DiffM
60
3,541
0
18 Feb 2021
Data optimization for large batch distributed training of deep neural networks
Shubhankar Gahlot
Junqi Yin
Mallikarjun Shankar
21
1
0
16 Dec 2020
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget
Erik Wijmans
Irfan Essa
Dhruv Batra
3DPC
30
10
0
11 Dec 2020
Scaling Laws for Autoregressive Generative Modeling
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
...
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
53
408
0
28 Oct 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
30
37
0
09 Jul 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
97
40,302
0
28 May 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
39
21
0
15 May 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
27
20
0
24 Feb 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
38
55
0
07 Jan 2020
Dota 2 with Large Scale Deep Reinforcement Learning
OpenAI OpenAI
:
Christopher Berner
Greg Brockman
Brooke Chan
...
Szymon Sidor
Ilya Sutskever
Jie Tang
Filip Wolski
Susan Zhang
GNN
VLM
CLL
AI4CE
LRM
46
1,799
0
13 Dec 2019
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
28
313
0
04 Dec 2019
RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning
Di Zhang
Dong Dai
Youbiao He
F. S. Bao
Bing Xie
OffRL
14
8
0
20 Oct 2019
Data Valuation using Reinforcement Learning
Jinsung Yoon
Sercan Ö. Arik
Tomas Pfister
TDI
30
174
0
25 Sep 2019
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
Adam Stooke
Pieter Abbeel
OffRL
24
96
0
03 Sep 2019
Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network
Oscar Chang
Yuling Yao
David Williams-King
Hod Lipson
BDL
UQCV
32
8
0
23 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedML
AI4CE
22
9
0
13 May 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Yanghua Peng
Hang Zhang
Yifei Ma
Tong He
Zhi-Li Zhang
Sheng Zha
Mu Li
28
23
0
26 Apr 2019
Modularization of End-to-End Learning: Case Study in Arcade Games
Andrew Melnik
Sascha Fleer
M. Schilling
Helge J. Ritter
OffRL
39
12
0
27 Jan 2019
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark
Cody Coleman
Daniel Kang
Deepak Narayanan
Luigi Nardi
Tian Zhao
Jian Zhang
Peter Bailis
K. Olukotun
Christopher Ré
Matei A. Zaharia
13
117
0
04 Jun 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
310
2,896
0
15 Sep 2016
Previous
1
2