ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.06840
  4. Cited By
ZeRO-Offload: Democratizing Billion-Scale Model Training

ZeRO-Offload: Democratizing Billion-Scale Model Training

18 January 2021
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
    MoE
ArXivPDFHTML

Papers citing "ZeRO-Offload: Democratizing Billion-Scale Model Training"

50 / 254 papers shown
Title
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
C. Jin
Ziheng Jiang
Zhihao Bai
Zheng Zhong
J. Liu
...
Yanghua Peng
Haibin Lin
Xuanzhe Liu
Xin Jin
Xin Liu
MoE
7
0
0
16 May 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su
Linjie Li
Mingyang Song
Yunzhuo Hao
Zhengyuan Yang
...
Guanjie Chen
Jiawei Gu
Juntao Li
Xiaoye Qu
Yu Cheng
OffRL
LRM
31
0
0
13 May 2025
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Athinagoras Skiadopoulos
Mark Zhao
Swapnil Gandhi
Thomas Norrie
Shrijeet Mukherjee
Christos Kozyrakis
MoE
91
0
0
28 Apr 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
29
0
0
16 Apr 2025
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
Igor Polyakov
Alexey Dukhanov
Egor Spirin
41
0
0
08 Apr 2025
Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis
Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis
Jiabo Shi
Yehia Elkhatib
3DH
VLM
30
0
0
04 Apr 2025
LandMarkSystem Technical Report
LandMarkSystem Technical Report
Zhenxiang Ma
Zhenyu Yang
Miao Tao
Yuanzhen Zhou
Zeyu He
Yuchang Zhang
Rong Fu
Hengjie Li
3DGS
33
0
0
27 Mar 2025
Gemma 3 Technical Report
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
90
32
0
25 Mar 2025
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Zhanda Zhu
Christina Giannoula
Muralidhar Andoorveedu
Qidong Su
Karttikeya Mangalam
Bojian Zheng
Gennady Pekhimenko
VLM
MoE
49
0
0
24 Mar 2025
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang
Qiuyu Huang
Junjie Liu
Xiefan Guo
Di Huang
54
2
0
24 Mar 2025
WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training
WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training
Z. Wang
Anna Cai
Xinfeng Xie
Zaifeng Pan
Yue Guan
...
Shikai Li
Jianyu Huang
Chris Cai
Yuchen Hao
Yufei Ding
39
2
0
23 Mar 2025
ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory
ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory
Liangyu Wang
Jie Ren
Hang Xu
Junxiao Wang
Huanyi Xie
David E. Keyes
Di Wang
58
0
0
16 Mar 2025
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary B. Charles
Gabriel Teston
Lucio Dery
Keith Rush
Nova Fallen
Zachary Garrett
Arthur Szlam
Arthur Douillard
154
0
0
12 Mar 2025
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference
Hongchao Du
Shangyu Wu
Arina Kharlamova
Nan Guan
Chun Jason Xue
51
1
0
04 Mar 2025
Teaching Metric Distance to Autoregressive Multimodal Foundational Models
Jiwan Chung
Saejin Kim
Yongrae Jo
J. Park
Dongjun Min
Youngjae Yu
71
0
0
04 Mar 2025
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Wenjie Qiu
Yi-Chen Li
Xuqin Zhang
Tianyi Zhang
Y. Zhang
Zongzhang Zhang
Yang Yu
ALM
46
0
0
01 Mar 2025
When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models
When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models
Weilan Wang
Yu Mao
Dongdong Tang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
67
1
0
24 Feb 2025
SQLong: Enhanced NL2SQL for Longer Contexts with LLMs
SQLong: Enhanced NL2SQL for Longer Contexts with LLMs
D. Q. Nguyen
Cong Duy Vu Hoang
Duy Vu
Gioacchino Tangari
Thanh Vu
Don Dharmasiri
Yuan-Fang Li
Long Duong
44
0
0
23 Feb 2025
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
Yuxiang Huang
Mingye Li
Xu Han
Chaojun Xiao
Weilin Zhao
Sun Ao
Hao Zhou
Jie Zhou
Zhiyuan Liu
Maosong Sun
42
0
0
17 Feb 2025
InsBank: Evolving Instruction Subset for Ongoing Alignment
InsBank: Evolving Instruction Subset for Ongoing Alignment
Jiayi Shi
Yiwei Li
Shaoxiong Feng
Peiwen Yuan
X. U. Wang
...
Chuyi Tan
Boyuan Pan
Huan Ren
Yao Hu
Kan Li
ALM
87
0
0
17 Feb 2025
PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts
Zeman Li
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
MoMe
75
1
0
10 Feb 2025
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Zhiyuan Fang
Yuegui Huang
Zicong Hong
Yufeng Lyu
Wuhui Chen
Yue Yu
Fan Yu
Zibin Zheng
MoE
48
0
0
09 Feb 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
93
154
0
28 Jan 2025
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science
Kaiyuan Tian
Linbo Qiao
Baihui Liu
Gongqingjian Jiang
Dongsheng Li
31
0
0
21 Jan 2025
On the Consideration of AI Openness: Can Good Intent Be Abused?
On the Consideration of AI Openness: Can Good Intent Be Abused?
Yeeun Kim
Eunkyung Choi
Hyunjun Kim
Hongseok Oh
Hyunseo Shin
Wonseok Hwang
SILM
46
1
0
08 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
144
0
0
30 Dec 2024
Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over
  Aligned Large Language Models
Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Yuchen Fan
Yuzhong Hong
Qiushi Wang
Junwei Bao
Hongfei Jiang
Yang Song
80
1
0
17 Dec 2024
Echo: Simulating Distributed Training At Scale
Echo: Simulating Distributed Training At Scale
Yicheng Feng
Yuetao Chen
Kaiwen Chen
Jingzong Li
Tianyuan Wu
Peng Cheng
Chuan Wu
Wei Wang
Tsung-Yi Ho
Hong Xu
68
1
0
17 Dec 2024
Towards Adaptive Mechanism Activation in Language Agent
Towards Adaptive Mechanism Activation in Language Agent
Ziyang Huang
Jun Zhao
Kang-Jun Liu
LLMAG
AI4CE
78
0
0
01 Dec 2024
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao
S. Sang
Tiancheng Zhi
Jing Liu
Qing Yan
Linjie Luo
Bo Yuan
Bo Yuan
VLM
86
1
0
26 Nov 2024
Hiding Communication Cost in Distributed LLM Training via Micro-batch
  Co-execution
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution
Haiquan Wang
Chaoyi Ruan
Jia He
Jiaqi Ruan
Chengjie Tang
Xiaosong Ma
Cheng-rong Li
73
1
0
24 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
195
3
0
20 Nov 2024
CULL-MT: Compression Using Language and Layer pruning for Machine
  Translation
CULL-MT: Compression Using Language and Layer pruning for Machine Translation
Pedram Rostami
M. Dousti
32
0
0
10 Nov 2024
Accelerating Large Language Model Training with 4D Parallelism and
  Memory Consumption Estimator
Accelerating Large Language Model Training with 4D Parallelism and Memory Consumption Estimator
Kazuki Fujii
Kohei Watanabe
Rio Yokota
32
0
0
10 Nov 2024
Photon: Federated LLM Pre-Training
Photon: Federated LLM Pre-Training
Lorenzo Sani
Alex Iacob
Zeyu Cao
Royson Lee
Bill Marino
...
Dongqi Cai
Zexi Li
Wanru Zhao
Xinchi Qiu
Nicholas D. Lane
AI4CE
26
7
0
05 Nov 2024
PipeLLM: Fast and Confidential Large Language Model Services with
  Speculative Pipelined Encryption
PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption
Yifan Tan
Cheng Tan
Zeyu Mi
Haibo Chen
31
1
0
04 Nov 2024
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test
  Generation: An Empirical Study
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study
André Storhaug
Jingyue Li
ALM
53
1
0
04 Nov 2024
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
100Kor100Days:Trade−offswhenPre−TrainingwithAcademicResources100K or 100 Days: Trade-offs when Pre-Training with Academic Resources100Kor100Days:Trade−offswhenPre−TrainingwithAcademicResources
Apoorv Khandelwal
Tian Yun
Nihal V. Nayak
Jack Merullo
Stephen H. Bach
Chen Sun
Ellie Pavlick
VLM
AI4CE
OnRL
58
2
0
30 Oct 2024
Deep Optimizer States: Towards Scalable Training of Transformer Models
  Using Interleaved Offloading
Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
29
1
0
26 Oct 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with
  System Co-Design
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai
Yeonju Ro
Geon-Woo Kim
Peihao Wang
Babak Ehteshami Bejnordi
Aditya Akella
Z. Wang
MoE
25
3
0
24 Oct 2024
ExpertFlow: Optimized Expert Activation and Token Allocation for
  Efficient Mixture-of-Experts Inference
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Xin He
Shunkang Zhang
Yuxin Wang
Haiyan Yin
Zihao Zeng
Shaohuai Shi
Zhenheng Tang
Xiaowen Chu
Ivor Tsang
Ong Yew Soon
MoE
63
3
0
23 Oct 2024
Understanding and Alleviating Memory Consumption in RLHF for LLMs
Understanding and Alleviating Memory Consumption in RLHF for LLMs
Jin Zhou
Hanmei Yang
Steven
Tang
Mingcan Xiang
Hui Guan
Tongping Liu
34
0
0
21 Oct 2024
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Zhaoxian Wu
Quan-Wu Xiao
Tayfun Gokmen
H. Tsai
K. E. Maghraoui
Tianyi Chen
16
1
0
19 Oct 2024
Layer-wise Importance Matters: Less Memory for Better Performance in
  Parameter-efficient Fine-tuning of Large Language Models
Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models
Kai Yao
P. Gao
Lichun Li
Yuan Zhao
Xiaofeng Wang
W. Wang
Jianke Zhu
24
1
0
15 Oct 2024
CursorCore: Assist Programming through Aligning Anything
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
50
1
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
31
7
0
08 Oct 2024
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Ulyana Piterbarg
Lerrel Pinto
Rob Fergus
SyDa
37
2
0
03 Oct 2024
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank
  Constraint?
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Xi Chen
Kaituo Feng
Changsheng Li
Xunhao Lai
Xiangyu Yue
Ye Yuan
Guoren Wang
39
7
0
02 Oct 2024
FedPT: Federated Proxy-Tuning of Large Language Models on
  Resource-Constrained Edge Devices
FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices
Zhidong Gao
Yu Zhang
Zhenxiao Zhang
Yanmin Gong
Yuanxiong Guo
18
0
0
01 Oct 2024
123456
Next