ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.02054
  4. Cited By
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

4 October 2019
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
    ALM
    AI4CE
ArXivPDFHTML

Papers citing "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models"

50 / 164 papers shown
Title
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Mucong Ding
Souradip Chakraborty
Vibhu Agrawal
Zora Che
Alec Koppel
Mengdi Wang
Amrit Singh Bedi
Furong Huang
47
10
0
21 Jun 2024
Learn Beyond The Answer: Training Language Models with Reflection for
  Mathematical Reasoning
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Zhihan Zhang
Zhenwen Liang
Wenhao Yu
Dian Yu
Mengzhao Jia
Dong Yu
Meng Jiang
AIMat
RALM
LRM
ReLM
43
13
0
17 Jun 2024
Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities
Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities
Zhonghao Li
Xuming Hu
Aiwei Liu
Kening Zheng
S. Huang
Hui Xiong
RALM
115
8
0
17 Jun 2024
Optimizing Large Model Training through Overlapped Activation Recomputation
Optimizing Large Model Training through Overlapped Activation Recomputation
Ping Chen
Wenjie Zhang
Shuibing He
Yingjie Gu
Zhuwei Peng
...
Yi Zheng
Zhefeng Wang
Yanlong Yin
Gang Chen
Gang Chen
35
5
0
13 Jun 2024
ProTrain: Efficient LLM Training via Memory-Aware Techniques
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Hanmei Yang
Jin Zhou
Yao Fu
Xiaoqun Wang
Ramine Roane
Hui Guan
Tongping Liu
VLM
36
0
0
12 Jun 2024
FaceGPT: Self-supervised Learning to Chat about 3D Human Faces
FaceGPT: Self-supervised Learning to Chat about 3D Human Faces
Haoran Wang
Mohit Mendiratta
Christian Theobalt
Adam Kortylewski
3DH
CVBM
31
3
0
11 Jun 2024
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Joongwon Kim
Bhargavi Paranjape
Tushar Khot
Hannaneh Hajishirzi
LM&Ro
ELM
LLMAG
LRM
46
9
0
10 Jun 2024
Margin-aware Preference Optimization for Aligning Diffusion Models
  without Reference
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong
Sayak Paul
Noah Lee
Kashif Rasul
James Thorne
Jongheon Jeong
43
13
0
10 Jun 2024
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large
  Language Model Training
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Ao Sun
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
31
7
0
05 Jun 2024
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and
  Social Experiences
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
Yidong Huang
Jacob Sansom
Ziqiao Ma
Felix Gervits
Joyce Chai
44
17
0
05 Jun 2024
PETRA: Parallel End-to-end Training with Reversible Architectures
PETRA: Parallel End-to-end Training with Reversible Architectures
Stéphane Rivaud
Louis Fournier
Thomas Pumir
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon
25
0
0
04 Jun 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for
  Low-Memory GPUs
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
47
5
0
30 May 2024
Empowering Character-level Text Infilling by Eliminating Sub-Tokens
Empowering Character-level Text Infilling by Eliminating Sub-Tokens
Houxing Ren
Mingjie Zhan
Zhongyuan Wu
Hongsheng Li
AI4CE
32
1
0
27 May 2024
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off
  Code Generation
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
Houxing Ren
Mingjie Zhan
Zhongyuan Wu
Aojun Zhou
Junting Pan
Hongsheng Li
SyDa
42
7
0
27 May 2024
Scaling Laws for Discriminative Classification in Large Language Models
Scaling Laws for Discriminative Classification in Large Language Models
Dean Wyatte
Fatemeh Tahmasbi
Ming Li
Thomas Markovich
47
2
0
24 May 2024
Sparse Matrix in Large Language Model Fine-tuning
Sparse Matrix in Large Language Model Fine-tuning
Haoze He
Juncheng Billy Li
Xuan Jiang
Heather Miller
MoE
27
3
0
24 May 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
67
0
0
13 May 2024
Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers
Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers
Won-Gi Paeng
Daesuk Kwon
Kyungwon Jeong
Honggyo Suh
71
0
0
07 May 2024
Towards Green AI: Current status and future research
Towards Green AI: Current status and future research
Christian Clemm
Lutz Stobbe
Kishan Wimalawarne
Jan Druschke
49
2
0
01 May 2024
PID-Comm: A Fast and Flexible Collective Communication Framework for
  Commodity Processing-in-DIMM Devices
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
34
7
0
13 Apr 2024
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical
  Reasoning in Large Language Models
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models
Hyeonwoo Kim
Gyoungjin Gim
Yungi Kim
Jihoo Kim
Byungju Kim
Wonseok Lee
Chanjun Park
ReLM
LRM
34
1
0
05 Apr 2024
Linear Attention Sequence Parallelism
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
76
2
0
03 Apr 2024
Partitioned Neural Network Training via Synthetic Intermediate Labels
Partitioned Neural Network Training via Synthetic Intermediate Labels
C. V. Karadag
Nezih Topaloglu
34
1
0
17 Mar 2024
Simple and Scalable Strategies to Continually Pre-train Large Language
  Models
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELM
CLL
44
54
0
13 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
43
178
0
06 Mar 2024
HeteGen: Heterogeneous Parallel Inference for Large Language Models on
  Resource-Constrained Devices
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
Xuanlei Zhao
Bin Jia
Hao Zhou
Ziming Liu
Shenggan Cheng
Yang You
27
4
0
02 Mar 2024
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Sunghyeon Woo
Baeseong Park
Byeongwook Kim
Minjung Jo
S. Kwon
Dongsuk Jeon
Dongsoo Lee
65
2
0
27 Feb 2024
LEIA: Facilitating Cross-lingual Knowledge Transfer in Language Models
  with Entity-based Data Augmentation
LEIA: Facilitating Cross-lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation
Ikuya Yamada
Ryokan Ri
KELM
25
0
0
18 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
375
0
09 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
130
109
0
08 Feb 2024
BetterV: Controlled Verilog Generation with Discriminative Guidance
BetterV: Controlled Verilog Generation with Discriminative Guidance
Zehua Pei
Hui-Ling Zhen
M. Yuan
Yu Huang
Bei Yu
32
56
0
03 Feb 2024
Nomic Embed: Training a Reproducible Long Context Text Embedder
Nomic Embed: Training a Reproducible Long Context Text Embedder
Zach Nussbaum
John X. Morris
Brandon Duderstadt
Andriy Mulyar
27
100
0
02 Feb 2024
TeenyTinyLlama: open-source tiny language models trained in Brazilian
  Portuguese
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
N. Corrêa
Sophia Falk
Shiza Fatimah
Aniket Sen
N. D. Oliveira
30
9
0
30 Jan 2024
ConFit: Improving Resume-Job Matching using Data Augmentation and
  Contrastive Learning
ConFit: Improving Resume-Job Matching using Data Augmentation and Contrastive Learning
Xiao Yu
Jinzhong Zhang
Zhou Yu
43
1
0
29 Jan 2024
Improving Medical Reasoning through Retrieval and Self-Reflection with
  Retrieval-Augmented Large Language Models
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models
Minbyul Jeong
Jiwoong Sohn
Mujeen Sung
Jaewoo Kang
23
29
0
27 Jan 2024
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence
  Inference
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Xuanlei Zhao
Shenggan Cheng
Guangyang Lu
Jiarui Fang
Hao Zhou
Bin Jia
Ziming Liu
Yang You
MQ
17
3
0
19 Jan 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
27
2
0
17 Jan 2024
Private Fine-tuning of Large Language Models with Zeroth-order Optimization
Private Fine-tuning of Large Language Models with Zeroth-order Optimization
Xinyu Tang
Ashwinee Panda
Milad Nasr
Saeed Mahloujifar
Prateek Mittal
47
18
0
09 Jan 2024
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
50
64
0
11 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
41
31
0
08 Dec 2023
FlexModel: A Framework for Interpretability of Distributed Large
  Language Models
FlexModel: A Framework for Interpretability of Distributed Large Language Models
Matthew Choi
Muhammad Adil Asif
John Willes
David Emerson
AI4CE
ALM
30
1
0
05 Dec 2023
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan
Dongsheng Li
Jiye Liang
Wenjian Wang
Wenjian Wang
Xicheng Lu
40
1
0
01 Dec 2023
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training
  Regime and Better Alignment to Human Preferences
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences
Yuanhe Tian
Ruyi Gan
Yan Song
Jiaxing Zhang
Yongdong Zhang
AI4MH
AI4CE
LM&MA
27
31
0
10 Nov 2023
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Kai Lv
Hang Yan
Qipeng Guo
Haijun Lv
Xipeng Qiu
ODL
27
20
0
16 Oct 2023
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Wenqi Jiang
Marco Zeller
R. Waleffe
Torsten Hoefler
Gustavo Alonso
54
14
0
15 Oct 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
77
710
0
19 Sep 2023
FLM-101B: An Open LLM and How to Train It with $100K Budget
FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
Li Du
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
60
21
0
07 Sep 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
39
2
0
31 Jul 2023
DropCompute: simple and more robust distributed synchronous training via
  compute variance reduction
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
38
2
0
18 Jun 2023
Full Parameter Fine-tuning for Large Language Models with Limited
  Resources
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
56
127
0
16 Jun 2023
Previous
1234
Next