ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.06840
  4. Cited By
ZeRO-Offload: Democratizing Billion-Scale Model Training

ZeRO-Offload: Democratizing Billion-Scale Model Training

18 January 2021
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
    MoE
ArXivPDFHTML

Papers citing "ZeRO-Offload: Democratizing Billion-Scale Model Training"

50 / 254 papers shown
Title
HybridFlow: A Flexible and Efficient RLHF Framework
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
33
71
0
28 Sep 2024
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training
Daiyaan Arfeen
Zhen Zhang
Xinwei Fu
G. R. Ganger
Yida Wang
AI4CE
31
0
0
23 Sep 2024
Achieving Peak Performance for Large Language Models: A Systematic
  Review
Achieving Peak Performance for Large Language Models: A Systematic Review
Z. R. K. Rostam
Sándor Szénási
Gábor Kertész
34
3
0
07 Sep 2024
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale
  Model-in-Network Data-Parallel Training on Distributed GPUs
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
Mo Sun
Zihan Yang
Changyue Liao
Yingtao Li
Fei Wu
Zeke Wang
52
1
0
02 Sep 2024
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous
  GPU Clusters
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
WenZheng Zhang
Yang Hu
Jing Shi
Xiaoying Bai
42
1
0
22 Aug 2024
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for
  Efficient MoE Inference
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference
Shuzhang Zhong
Ling Liang
Yuan Wang
Runsheng Wang
Ru Huang
Meng Li
MoE
15
7
0
19 Aug 2024
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster
  Scheduling
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Xinyi Zhang
Hanyu Zhao
Wencong Xiao
Xianyan Jia
Fei Xu
Yong Li
Wei Lin
Fangming Liu
20
1
0
16 Aug 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
51
41
0
01 Aug 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLM
MoE
OSLM
37
658
0
31 Jul 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
73
8
0
29 Jul 2024
Halu-J: Critique-Based Hallucination Judge
Halu-J: Critique-Based Hallucination Judge
Binjie Wang
Steffi Chern
Ethan Chern
Pengfei Liu
HILM
36
7
0
17 Jul 2024
TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure
  Collaborative Tensor Computing
TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing
Husheng Han
Xinyao Zheng
Yuanbo Wen
Yifan Hao
Erhu Feng
...
Pengwei Jin
Xinkai Song
Zidong Du
Qi Guo
Xing Hu
27
0
0
12 Jul 2024
Harnessing Integrated CPU-GPU System Memory for HPC: a first look into
  Grace Hopper
Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper
Gabin Schieffer
Jacob Wahlgren
Jie Ren
Jennifer Faj
Ivy Bo Peng
46
8
0
10 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
LLMBox: A Comprehensive Library for Large Language Models
LLMBox: A Comprehensive Library for Large Language Models
Tianyi Tang
Yiwen Hu
Bingqian Li
Wenyang Luo
Zijing Qin
...
Chunxuan Xia
Junyi Li
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
31
1
0
08 Jul 2024
PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs
PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs
Dan Peng
Zhihui Fu
Jun Wang
37
12
0
01 Jul 2024
WallFacer: Guiding Transformer Model Training Out of the Long-Context
  Dark Forest with N-body Problem
WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem
Ziming Liu
Shaoyu Wang
Shenggan Cheng
Zhongkai Zhao
Xuanlei Zhao
James Demmel
Yang You
32
0
0
30 Jun 2024
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context
  Parallelism
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism
Diandian Gu
Peng Sun
Qinghao Hu
Ting Huang
Xun Chen
...
Jiarui Fang
Yonggang Wen
Tianwei Zhang
Xin Jin
Xuanzhe Liu
LRM
40
7
0
26 Jun 2024
Timo: Towards Better Temporal Reasoning for Language Models
Timo: Towards Better Temporal Reasoning for Language Models
Zhaochen Su
Jun Zhang
Tong Zhu
Xiaoye Qu
Juntao Li
Min Zhang
Yu Cheng
LRM
47
17
0
20 Jun 2024
How Far Can In-Context Alignment Go? Exploring the State of In-Context
  Alignment
How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment
Heyan Huang
Yinghao Li
Huashan Sun
Yu Bai
Yang Gao
47
3
0
17 Jun 2024
Eliminating Biased Length Reliance of Direct Preference Optimization via
  Down-Sampled KL Divergence
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Junru Lu
Jiazheng Li
Siyu An
Meng Zhao
Yulan He
Di Yin
Xing Sun
39
13
0
16 Jun 2024
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory
  Utilization for Hybrid CPU-GPU Offloaded Optimizers
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
23
1
0
15 Jun 2024
Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors
Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors
Siyuan Chen
Zelong Guan
Yudong Liu
Phillip B. Gibbons
Phillip B. Gibbons
35
0
0
14 Jun 2024
Optimizing Large Model Training through Overlapped Activation Recomputation
Optimizing Large Model Training through Overlapped Activation Recomputation
Ping Chen
Wenjie Zhang
Shuibing He
Yingjie Gu
Zhuwei Peng
...
Yi Zheng
Zhefeng Wang
Yanlong Yin
Gang Chen
Gang Chen
35
5
0
13 Jun 2024
ProTrain: Efficient LLM Training via Memory-Aware Techniques
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Hanmei Yang
Jin Zhou
Yao Fu
Xiaoqun Wang
Ramine Roane
Hui Guan
Tongping Liu
VLM
30
0
0
12 Jun 2024
CoEvol: Constructing Better Responses for Instruction Finetuning through
  Multi-Agent Cooperation
CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation
Renhao Li
Minghuan Tan
Derek F. Wong
Min Yang
LLMAG
23
1
0
11 Jun 2024
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel
  Fusion
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Li-Wen Chang
Yiyuan Ma
Qi Hou
Chengquan Jiang
Ningxin Zheng
...
Zuquan Song
Ziheng Jiang
Yanghua Peng
Xuanzhe Liu
Xin Liu
41
21
0
11 Jun 2024
MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter
MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter
Jitai Hao
Weiwei Sun
Xin Xin
Qi Meng
Zhumin Chen
Pengjie Ren
Zhaochun Ren
MoE
42
2
0
07 Jun 2024
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large
  Language Model Training
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Ao Sun
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
31
7
0
05 Jun 2024
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM
  Inference on Consumer Devices
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
Ruslan Svirschevski
Avner May
Zhuoming Chen
Beidi Chen
Zhihao Jia
Max Ryabinin
31
12
0
04 Jun 2024
A Study of Optimizations for Fine-tuning Large Language Models
A Study of Optimizations for Fine-tuning Large Language Models
Arjun Singh
Nikhil Pandey
Anup Shirgaonkar
Pavan Manoj
Vijay Aski
21
4
0
04 Jun 2024
PETRA: Parallel End-to-end Training with Reversible Architectures
PETRA: Parallel End-to-end Training with Reversible Architectures
Stephane Rivaud
Louis Fournier
Thomas Pumir
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon
23
0
0
04 Jun 2024
ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training
ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training
Adel Nabli
Louis Fournier
Pierre Erbacher
Louis Serrano
Eugene Belilovsky
Edouard Oyallon
FedML
46
1
0
03 Jun 2024
Automatic Instruction Evolving for Large Language Models
Automatic Instruction Evolving for Large Language Models
Weihao Zeng
Can Xu
Yingxiu Zhao
Jianguang Lou
Weizhu Chen
SyDa
40
9
0
02 Jun 2024
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
Taehyun Kim
Kwanseok Choi
Youngmock Cho
Jaehoon Cho
Hyukzae Lee
Jaewoong Sim
MoE
25
4
0
29 May 2024
2BP: 2-Stage Backpropagation
2BP: 2-Stage Backpropagation
Christopher Rae
Joseph K. L. Lee
James Richings
MoE
MQ
39
0
0
28 May 2024
TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
Zhimin Ding
Jiawen Yao
Brianna Barrow
Tania Lorido-Botran
Christopher M. Jermaine
Yu-Shuen Tang
Jiehui Li
Xinyu Yao
Sleem Mahmoud Abdelghafar
Daniel Bourgeois
38
1
0
25 May 2024
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM
  Compression
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Vladimir Malinovskii
Denis Mazur
Ivan Ilin
Denis Kuznedelev
Konstantin Burlachenko
Kai Yi
Dan Alistarh
Peter Richtárik
MQ
29
19
0
23 May 2024
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs
  Amid Failures
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures
Swapnil Gandhi
Mark Zhao
Athinagoras Skiadopoulos
Christos Kozyrakis
AI4CE
GNN
41
8
0
22 May 2024
The Future of Large Language Model Pre-training is Federated
The Future of Large Language Model Pre-training is Federated
Lorenzo Sani
Alexandru Iacob
Zeyu Cao
Bill Marino
Yan Gao
...
Wanru Zhao
William F. Shen
Preslav Aleksandrov
Xinchi Qiu
Nicholas D. Lane
AI4CE
35
12
0
17 May 2024
USP: A Unified Sequence Parallelism Approach for Long Context Generative
  AI
USP: A Unified Sequence Parallelism Approach for Long Context Generative AI
Jiarui Fang
Shangchun Zhao
32
15
0
13 May 2024
ColA: Collaborative Adaptation with Gradient Learning
ColA: Collaborative Adaptation with Gradient Learning
Enmao Diao
Qi Le
Suya Wu
Xinran Wang
Ali Anwar
Jie Ding
Vahid Tarokh
35
1
0
22 Apr 2024
Sequence Length Scaling in Vision Transformers for Scientific Images on
  Frontier
Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier
A. Tsaris
Chengming Zhang
Xiao Wang
Junqi Yin
Siyan Liu
...
Jong Youl Choi
M. Wahib
Dan Lu
Prasanna Balaprakash
Feiyi Wang
18
1
0
17 Apr 2024
Pretraining Billion-scale Geospatial Foundational Models on Frontier
Pretraining Billion-scale Geospatial Foundational Models on Frontier
A. Tsaris
P. Dias
Abhishek Potnis
Junqi Yin
Feiyi Wang
D. Lunga
AI4CE
30
4
0
17 Apr 2024
LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs
LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs
Taeho Kim
Yanming Wang
Vatshank Chaturvedi
Lokesh Gupta
Seyeon Kim
Yongin Kwon
Sangtae Ha
36
4
0
16 Apr 2024
Lowering PyTorch's Memory Consumption for Selective Differentiation
Lowering PyTorch's Memory Consumption for Selective Differentiation
Samarth Bhatia
Felix Dangel
16
1
0
15 Apr 2024
NoticIA: A Clickbait Article Summarization Dataset in Spanish
NoticIA: A Clickbait Article Summarization Dataset in Spanish
Iker García-Ferrero
Begoña Altuna
37
2
0
11 Apr 2024
Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics
Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics
Zhengde Zhang
Yiyu Zhang
Haodong Yao
Jianwen Luo
Rui Zhao
...
Ke Li
Lina Zhao
Jun Cao
Fazhi Qi
Changzheng Yuan
32
2
0
08 Apr 2024
BAdam: A Memory Efficient Full Parameter Optimization Method for Large
  Language Models
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models
Qi Luo
Hengxu Yu
Xiao Li
39
1
0
03 Apr 2024
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language
  Model Fine-Tuning
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Rui Pan
Xiang Liu
Shizhe Diao
Renjie Pi
Jipeng Zhang
Chi Han
Tong Zhang
38
36
0
26 Mar 2024
Previous
123456
Next