Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.13198
Cited By
Exploring Scaling Laws for Local SGD in Large Language Model Training
20 September 2024
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploring Scaling Laws for Local SGD in Large Language Model Training"
7 / 7 papers shown
Title
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
104
63
0
11 Jun 2024
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
112
381
0
04 Jan 2024
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
137
1,915
0
29 Mar 2022
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
74
667
0
09 Apr 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
466
4,662
0
23 Jan 2020
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
242
8,030
0
13 Aug 2016
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. B. McMahan
Eider Moore
Daniel Ramage
S. Hampson
Blaise Agüera y Arcas
FedML
246
17,328
0
17 Feb 2016
1