ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.06840
  4. Cited By
ZeRO-Offload: Democratizing Billion-Scale Model Training

ZeRO-Offload: Democratizing Billion-Scale Model Training

18 January 2021
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
    MoE
ArXivPDFHTML

Papers citing "ZeRO-Offload: Democratizing Billion-Scale Model Training"

50 / 254 papers shown
Title
RAMP: A Flat Nanosecond Optical Network and MPI Operations for
  Distributed Deep Learning Systems
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Alessandro Ottino
Joshua L. Benjamin
G. Zervas
30
7
0
28 Nov 2022
A Simple, Yet Effective Approach to Finding Biases in Code Generation
A Simple, Yet Effective Approach to Finding Biases in Code Generation
Spyridon Mouselinos
Mateusz Malinowski
Henryk Michalewski
10
7
0
31 Oct 2022
Accelerating Distributed MoE Training and Inference with Lina
Accelerating Distributed MoE Training and Inference with Lina
Jiamin Li
Yimin Jiang
Yibo Zhu
Cong Wang
Hong-Yu Xu
MoE
17
58
0
31 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
31
47
0
13 Oct 2022
Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR
Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR
Sami Alabed
Dominik Grewe
Juliana Franco
Bart Chrzaszcz
Tom Natan
Tamara Norman
Norman A. Rink
Dimitrios Vytiniotis
Michael Schaarschmidt
MoE
20
1
0
07 Oct 2022
Petals: Collaborative Inference and Fine-tuning of Large Models
Petals: Collaborative Inference and Fine-tuning of Large Models
Alexander Borzunov
Dmitry Baranchuk
Tim Dettmers
Max Ryabinin
Younes Belkada
Artem Chumachenko
Pavel Samygin
Colin Raffel
VLM
33
62
0
02 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
Training a T5 Using Lab-sized Resources
Training a T5 Using Lab-sized Resources
Manuel R. Ciosici
Leon Derczynski
VLM
28
8
0
25 Aug 2022
On the independence between phenomenal consciousness and computational
  intelligence
On the independence between phenomenal consciousness and computational intelligence
E.C. Garrido-Merchán
S. Lumbreras
AI4CE
22
3
0
03 Aug 2022
Dive into Big Model Training
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMe
AI4CE
LRM
13
3
0
25 Jul 2022
POET: Training Neural Networks on Tiny Devices with Integrated
  Rematerialization and Paging
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging
Shishir G. Patil
Paras Jain
P. Dutta
Ion Stoica
Joseph E. Gonzalez
12
35
0
15 Jul 2022
Training Transformers Together
Training Transformers Together
Alexander Borzunov
Max Ryabinin
Tim Dettmers
Quentin Lhoest
Lucile Saulnier
Michael Diskin
Yacine Jernite
Thomas Wolf
ViT
23
8
0
07 Jul 2022
GACT: Activation Compressed Training for Generic Network Architectures
GACT: Activation Compressed Training for Generic Network Architectures
Xiaoxuan Liu
Lianmin Zheng
Dequan Wang
Yukuo Cen
Weize Chen
...
Zhiyuan Liu
Jie Tang
Joey Gonzalez
Michael W. Mahoney
Alvin Cheung
VLM
GNN
MQ
17
30
0
22 Jun 2022
Merak: An Efficient Distributed DNN Training Framework with Automated 3D
  Parallelism for Giant Foundation Models
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Zhiquan Lai
Shengwei Li
Xudong Tang
Ke-shi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
22
39
0
10 Jun 2022
A New Frontier of AI: On-Device AI Training and Personalization
A New Frontier of AI: On-Device AI Training and Personalization
Jijoong Moon
Parichay Kapoor
Ji Hoon Lee
Donghak Park
Seungbaek Hong
Hyungyu Lee
Donghyeon Jeong
Sungsik Kong
MyungJoo Ham
11
3
0
09 Jun 2022
Decentralized Training of Foundation Models in Heterogeneous
  Environments
Decentralized Training of Foundation Models in Heterogeneous Environments
Binhang Yuan
Yongjun He
Jared Davis
Tianyi Zhang
Tri Dao
Beidi Chen
Percy Liang
Christopher Ré
Ce Zhang
25
90
0
02 Jun 2022
Can Foundation Models Help Us Achieve Perfect Secrecy?
Can Foundation Models Help Us Achieve Perfect Secrecy?
Simran Arora
Christopher Ré
FedML
21
6
0
27 May 2022
Reducing Activation Recomputation in Large Transformer Models
Reducing Activation Recomputation in Large Transformer Models
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
27
256
0
10 May 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul M. Chilimbi
Mu Li
Xin Jin
11
39
0
30 Apr 2022
Prompt Consistency for Zero-Shot Task Generalization
Prompt Consistency for Zero-Shot Task Generalization
Chunting Zhou
Junxian He
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
VLM
14
74
0
29 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
89
6,004
0
05 Apr 2022
DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Yu Tang
Chenyu Wang
Yufan Zhang
Yuliang Liu
Xingcheng Zhang
Linbo Qiao
Zhiquan Lai
Dongsheng Li
21
4
0
30 Mar 2022
Survey on Large Scale Neural Network Training
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan V. Oseledets
Olivier Beaumont
22
10
0
21 Feb 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive
  DNN Models on Commodity Servers
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
N. Kim
11
19
0
02 Feb 2022
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed
  Deep Learning
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Lianmin Zheng
Zhuohan Li
Hao Zhang
Yonghao Zhuang
Zhifeng Chen
...
Yuanzhong Xu
Danyang Zhuo
Eric P. Xing
Joseph E. Gonzalez
Ion Stoica
MoE
22
104
0
28 Jan 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional
  Vision-Language Generation
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
27
58
0
31 Dec 2021
Automap: Towards Ergonomic Automated Parallelism for ML Models
Automap: Towards Ergonomic Automated Parallelism for ML Models
Michael Schaarschmidt
Dominik Grewe
Dimitrios Vytiniotis
Adam Paszke
G. Schmid
...
James Molloy
Jonathan Godwin
Norman A. Rink
Vinod Nair
Dan Belov
MoE
17
16
0
06 Dec 2021
Sparse Fusion for Multimodal Transformers
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
27
7
0
23 Nov 2021
Amazon SageMaker Model Parallelism: A General and Flexible Framework for
  Large Model Training
Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training
C. Karakuş
R. Huilgol
Fei Wu
Anirudh Subramanian
Cade Daniel
D. Çavdar
Teng Xu
Haohan Chen
Arash Rahnama
L. Quintela
MoE
AI4CE
23
28
0
10 Nov 2021
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel
  Training
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
21
143
0
28 Oct 2021
AxoNN: An asynchronous, message-driven parallel framework for
  extreme-scale deep learning
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning
Siddharth Singh
A. Bhatele
GNN
18
14
0
25 Oct 2021
Hydra: A System for Large Multi-Model Deep Learning
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
38
5
0
16 Oct 2021
A Short Study on Compressing Decoder-Based Language Models
A Short Study on Compressing Decoder-Based Language Models
Tianda Li
Yassir El Mesbahi
I. Kobyzev
Ahmad Rashid
A. Mahmud
Nithin Anchuri
Habib Hajimolahoseini
Yang Liu
Mehdi Rezagholizadeh
91
25
0
16 Oct 2021
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion
  Parameter Pretraining
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining
Junyang Lin
An Yang
Jinze Bai
Chang Zhou
Le Jiang
...
Jie M. Zhang
Yong Li
Wei Lin
Jingren Zhou
Hongxia Yang
MoE
92
43
0
08 Oct 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
96
84
0
22 Sep 2021
Can the Transformer Be Used as a Drop-in Replacement for RNNs in
  Text-Generating GANs?
Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?
Kevin Blin
Andrei Kucharavy
8
2
0
26 Aug 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based
  Memory Management
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
29
24
0
12 Aug 2021
Long-term series forecasting with Query Selector -- efficient model of
  sparse attention
Long-term series forecasting with Query Selector -- efficient model of sparse attention
J. Klimek
Jakub Klímek
W. Kraskiewicz
Mateusz Topolewski
AI4TS
20
6
0
19 Jul 2021
Chimera: Efficiently Training Large-Scale Neural Networks with
  Bidirectional Pipelines
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
77
131
0
14 Jul 2021
JUWELS Booster -- A Supercomputer for Large-Scale AI Research
JUWELS Booster -- A Supercomputer for Large-Scale AI Research
Stefan Kesselheim
A. Herten
K. Krajsek
J. Ebert
J. Jitsev
...
A. Strube
Roshni Kamath
Martin G. Schultz
M. Riedel
T. Lippert
GNN
25
14
0
30 Jun 2021
Distributed Deep Learning in Open Collaborations
Distributed Deep Learning in Open Collaborations
Michael Diskin
Alexey Bukhtiyarov
Max Ryabinin
Lucile Saulnier
Quentin Lhoest
...
Denis Mazur
Ilia Kobelev
Yacine Jernite
Thomas Wolf
Gennady Pekhimenko
FedML
33
54
0
18 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
37
815
0
14 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and
  efficient training of large language models
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
13
8
0
04 Jun 2021
M6-T: Exploring Sparse Expert Models and Beyond
M6-T: Exploring Sparse Expert Models and Beyond
An Yang
Junyang Lin
Rui Men
Chang Zhou
Le Jiang
...
Dingyang Zhang
Wei Lin
Lin Qu
Jingren Zhou
Hongxia Yang
MoE
31
24
0
31 May 2021
Sequence Parallelism: Long Sequence Training from System Perspective
Sequence Parallelism: Long Sequence Training from System Perspective
Shenggui Li
Fuzhao Xue
Chaitanya Baranwal
Yongbin Li
Yang You
14
90
0
26 May 2021
ActNN: Reducing Training Memory Footprint via 2-Bit Activation
  Compressed Training
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Jianfei Chen
Lianmin Zheng
Z. Yao
Dequan Wang
Ion Stoica
Michael W. Mahoney
Joseph E. Gonzalez
MQ
19
74
0
29 Apr 2021
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep
  Learning
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
32
367
0
16 Apr 2021
An Efficient 2D Method for Training Super-Large Deep Learning Models
An Efficient 2D Method for Training Super-Large Deep Learning Models
Qifan Xu
Shenggui Li
Chaoyu Gong
Yang You
19
0
0
12 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
M. Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
11
645
0
09 Apr 2021
Pinpointing the Memory Behaviors of DNN Training
Pinpointing the Memory Behaviors of DNN Training
Jiansong Li
Xiao-jun Dong
Guangli Li
Peng Zhao
Xueying Wang
...
Yongxin Yang
Zihan Jiang
Wei Cao
Lei Liu
Xiaobing Feng
13
1
0
01 Apr 2021
Previous
123456
Next