ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.02054
  4. Cited By
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

4 October 2019
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
    ALM
    AI4CE
ArXivPDFHTML

Papers citing "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models"

50 / 161 papers shown
Title
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving
Haoyuan Wu
Xueyi Chen
Rui Ming
Jilong Gao
Shoubo Hu
Zhuolun He
Bei Yu
LRM
24
0
0
19 May 2025
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
C. Jin
Ziheng Jiang
Zhihao Bai
Zheng Zhong
Jing Liu
...
Yanghua Peng
Xuanzhe Liu
Xuanzhe Liu
Xin Jin
Xin Liu
MoE
12
0
0
16 May 2025
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Ke Wang
Junting Pan
Linda Wei
Aojun Zhou
Weikang Shi
...
Han Xiao
Yiran Yang
Houxing Ren
Mingjie Zhan
Hongsheng Li
29
0
0
15 May 2025
Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios
Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios
Huafeng Shi
Jianzhong Liang
Rongchang Xie
Xian Wu
Cheng Chen
Chang Liu
VGen
22
0
0
14 May 2025
On the Robustness of Reward Models for Language Model Alignment
On the Robustness of Reward Models for Language Model Alignment
Jiwoo Hong
Noah Lee
Eunki Kim
Guijin Son
Woojin Chung
Aman Gupta
Shao Tang
James Thorne
29
0
0
12 May 2025
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding
Dawei Huang
Qing Li
Chuan Yan
Zebang Cheng
Jiaming Ji
Xiang Li
Yangqiu Song
Xiaobei Wang
Zheng Lian
Xiaojiang Peng
29
0
0
10 May 2025
Understanding Stragglers in Large Model Training Using What-if Analysis
Understanding Stragglers in Large Model Training Using What-if Analysis
Jinkun Lin
Ziheng Jiang
Zuquan Song
Sida Zhao
Menghan Yu
...
Shuguang Wang
Yanghua Peng
Xin Liu
Aurojit Panda
Jinyang Li
44
0
0
09 May 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Lei Li
AI4TS
187
0
0
02 May 2025
Scalable Meta-Learning via Mixed-Mode Differentiation
Scalable Meta-Learning via Mixed-Mode Differentiation
Iurii Kemaev
Dan A Calian
Luisa M Zintgraf
Gregory Farquhar
H. V. Hasselt
57
0
0
01 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kaipeng Zhang
Lizhuang Ma
Yufei Guo
Jun Wang
Wenbo Zhang
MQ
57
0
0
01 May 2025
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training
Xinyi Liu
Yufei Wang
Shenhan Zhu
Fangcheng Fu
Qingshuo Liu
Guangming Lin
Bin Cui
GNN
155
0
0
30 Apr 2025
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Athinagoras Skiadopoulos
Mark Zhao
Swapnil Gandhi
Thomas Norrie
Shrijeet Mukherjee
Christos Kozyrakis
MoE
91
0
0
28 Apr 2025
The Big Send-off: High Performance Collectives on GPU-based Supercomputers
The Big Send-off: High Performance Collectives on GPU-based Supercomputers
Siddharth Singh
Mahua Singh
A. Bhatele
54
0
0
25 Apr 2025
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
Dennis Liu
Zijie Yan
Xin Yao
Tong Liu
V. Korthikanti
...
Jiajie Yao
Chandler Zhou
David Wu
Xipeng Li
J. Yang
MoE
70
0
0
21 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Zhengzhang Chen
Zongyu Lin
MLLM
VLM
MoE
219
4
0
10 Apr 2025
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
Igor Polyakov
Alexey Dukhanov
Egor Spirin
46
0
0
08 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
52
1
0
05 Apr 2025
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs
Juncheng Wu
Wenlong Deng
X. Li
Sheng Liu
Taomian Mi
...
Yihan Cao
Hui Ren
Xuzhao Li
Xiaoxiao Li
Yuyin Zhou
AI4MH
LRM
61
3
0
01 Apr 2025
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training
Yijie Zheng
Bangjun Xiao
Lei Shi
Xiaoyang Li
Faming Wu
Tianyu Li
Xuefeng Xiao
Wenjie Qu
Yansen Wang
Shouda Liu
MLLM
MoE
69
1
0
31 Mar 2025
Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use
Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use
Nicholas Roth
Christopher Hidey
Lucas Spangher
William Arnold
Chang Ye
Nick Masiewicki
Jinoo Baek
Peter Grabowski
Eugene Ie
LLMAG
58
0
0
29 Mar 2025
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Jinjin Zhang
Qiuyu Huang
Junjie Liu
Xiefan Guo
Di Huang
62
2
0
24 Mar 2025
From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment
From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment
J. Li
Jian Guan
Songhao Wu
Wei Wu
Rui Yan
70
1
0
19 Mar 2025
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
Wenqi Jiang
Suvinay Subramanian
Cat Graves
Gustavo Alonso
Amir Yazdanbakhsh
Vidushi Dadu
49
6
0
18 Mar 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Wenhan Yang
Yanan Zheng
Zeyu Li
Zifan He
Shenyang Tong
Hailei Gong
LRM
90
2
0
16 Mar 2025
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Mark Chen
S. Hu
Wayne Luk
Timothy M. Hospedales
Hongxiang Fan
MoMe
72
0
0
16 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
200
0
0
10 Mar 2025
Seesaw: High-throughput LLM Inference via Model Re-sharding
Qidong Su
Wei Zhao
Xuelong Li
Muralidhar Andoorveedu
Chenhao Jiang
Zhanda Zhu
Kevin Song
Christina Giannoula
Gennady Pekhimenko
LRM
77
0
0
09 Mar 2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo
Tong Zheng
Yongyu Mu
Yangqiu Song
Qinghong Zhang
...
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
AI4CE
212
0
0
09 Mar 2025
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Huatong Song
Jinhao Jiang
Yingqian Min
Jie Chen
Z. Chen
Wayne Xin Zhao
Lei Fang
Zhicheng Dou
AI4TS
LRM
KELM
97
14
0
07 Mar 2025
Robust Learning of Diverse Code Edits
Robust Learning of Diverse Code Edits
Tushar Aggarwal
Swayam Singh
Abhijeet Awasthi
Aditya Kanade
Nagarajan Natarajan
SyDa
201
0
0
05 Mar 2025
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
Ge Li
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
46
0
0
28 Feb 2025
Stochastic Rounding for LLM Training: Theory and Practice
Stochastic Rounding for LLM Training: Theory and Practice
Kaan Ozkara
Tao Yu
Youngsuk Park
43
0
0
27 Feb 2025
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
Layba Fiaz
Munief Hassan Tahir
Sana Shams
Sarmad Hussain
51
0
0
24 Feb 2025
Learning to Retrieve and Reason on Knowledge Graph through Active Self-Reflection
Learning to Retrieve and Reason on Knowledge Graph through Active Self-Reflection
Han Zhang
Langshi Zhou
Hanfang Yang
LRM
RALM
ReLM
KELM
209
1
0
24 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu-Xi Cheng
KELM
75
3
0
19 Feb 2025
SegSub: Evaluating Robustness to Knowledge Conflicts and Hallucinations in Vision-Language Models
SegSub: Evaluating Robustness to Knowledge Conflicts and Hallucinations in Vision-Language Models
Peter Carragher
Nikitha Rao
Abhinand Jha
R Raghav
Kathleen M. Carley
VLM
56
0
0
19 Feb 2025
FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
Bingzhe Zhao
Ke Cheng
Aomufei Yuan
Yuxuan Tian
Ruiguang Zhong
Chengchen Hu
Tong Yang
Lian Yu
51
0
0
19 Feb 2025
MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
Linjie Mu
Zhongzhen Huang
Shengqian Qin
Yakun Zhu
S. Zhang
Xiaofan Zhang
44
1
0
17 Feb 2025
Understanding Silent Data Corruption in LLM Training
Understanding Silent Data Corruption in LLM Training
Jeffrey Ma
Hengzhi Pei
Leonard Lausen
George Karypis
42
0
0
17 Feb 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Guoqing Ma
Haoyang Huang
K. Yan
L. Chen
Nan Duan
...
Yansen Wang
Yuanwei Lu
Yu-Cheng Chen
Yu-Juan Luo
Yihao Luo
DiffM
VGen
177
18
0
14 Feb 2025
Typhoon T1: An Open Thai Reasoning Model
Typhoon T1: An Open Thai Reasoning Model
Pittawat Taveekitworachai
Potsawee Manakul
Kasima Tharnpipitchai
Kunat Pipatanakul
OffRL
LRM
102
0
0
13 Feb 2025
GoRA: Gradient-driven Adaptive Low Rank Adaptation
GoRA: Gradient-driven Adaptive Low Rank Adaptation
Haonan He
Peng Ye
Yuchen Ren
Yuan Yuan
Lei Chen
AI4TS
AI4CE
211
0
0
13 Feb 2025
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Siddharth Singh
Prajwal Singhania
Aditya K. Ranjan
John Kirchenbauer
Jonas Geiping
...
Abhimanyu Hans
Manli Shu
Aditya Tomar
Tom Goldstein
A. Bhatele
102
2
0
12 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
217
0
0
04 Feb 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao
Sen Zhang
Liang Ding
Yuqi Zhang
Lefei Zhang
Dacheng Tao
86
3
0
31 Jan 2025
Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis
Robinson Umeike
N. Getty
Fangfang Xia
Rick L. Stevens
37
2
0
28 Jan 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Min Zhang
LM&MA
AILaw
98
154
0
28 Jan 2025
360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation
360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation
Hamed Firooz
Maziar Sanjabi
Adrian Englhardt
Aman Gupta
Ben Levine
...
Xiaoling Zhai
Ya Xu
Yu Wang
Yun Dai
Yun Dai
ALM
49
3
0
27 Jan 2025
Learning Versatile Optimizers on a Compute Diet
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
190
0
0
22 Jan 2025
iServe: An Intent-based Serving System for LLMs
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
217
0
0
08 Jan 2025
1234
Next