Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.02679
Cited By
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
4 June 2021
J. Lamy-Poirier
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models"
5 / 5 papers shown
Title
PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation
Jaejung Seol
Seojun Kim
Jaejun Yoo
3DV
VLM
36
7
0
01 Apr 2024
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
416
0
18 Jan 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,017
0
28 Jul 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
1