Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.11913
Cited By
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
27 January 2023
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"
29 / 29 papers shown
Title
Nesterov Method for Asynchronous Pipeline Parallel Optimization
Thalaiyasingam Ajanthan
Sameera Ramasinghe
Yan Zuo
Gil Avraham
Alexander Long
24
0
0
02 May 2025
SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous Networks
Nikolay Blagoev
Lydia Yiyu Chen
Oğuzhan Ersoy
57
0
0
27 Feb 2025
Stealing Training Data from Large Language Models in Decentralized Training through Activation Inversion Attack
Chenxi Dai
Lin Lu
Pan Zhou
45
0
0
22 Feb 2025
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
195
3
0
20 Nov 2024
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Zhenheng Tang
Xueze Kang
Yiming Yin
Xinglin Pan
Yuxin Wang
...
Shaohuai Shi
Amelie Chi Zhou
Bo Li
Bingsheng He
Xiaowen Chu
AI4CE
70
8
0
16 Oct 2024
Achieving Peak Performance for Large Language Models: A Systematic Review
Z. R. K. Rostam
Sándor Szénási
Gábor Kertész
32
3
0
07 Sep 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
71
8
0
29 Jul 2024
Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Chengming Li
Victor C. M. Leung
Yanyi Guo
Xiping Hu
38
3
0
12 Jun 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
34
6
0
09 Apr 2024
ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment
Xiaofeng Wu
Jia Rao
Wei Chen
28
2
0
15 Mar 2024
A Survey on Efficient Federated Learning Methods for Foundation Model Training
Herbert Woisetschläger
Alexander Isenko
Shiqiang Wang
R. Mayer
Hans-Arno Jacobsen
FedML
60
23
0
09 Jan 2024
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Alan Chan
Ben Bucknall
Herbie Bradley
David M. Krueger
11
6
0
22 Dec 2023
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Alexander Borzunov
Max Ryabinin
Artem Chumachenko
Dmitry Baranchuk
Tim Dettmers
Younes Belkada
Pavel Samygin
Colin Raffel
MoE
ALM
13
39
0
13 Dec 2023
Exploring the Robustness of Decentralized Training for Large Language Models
Lin Lu
Chenxi Dai
Wangcheng Tao
Binhang Yuan
Yanan Sun
Pan Zhou
24
1
0
01 Dec 2023
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
Youhe Jiang
Ran Yan
Xiaozhe Yao
Yang Zhou
Beidi Chen
Binhang Yuan
SyDa
22
10
0
20 Nov 2023
Practical Performance Guarantees for Pipelined DNN Inference
Aaron Archer
Matthew Fahrbach
Kuikui Liu
Prakash Prabhu
21
0
0
07 Nov 2023
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
Zhenheng Tang
Yuxin Wang
Xin He
Longteng Zhang
Xinglin Pan
...
Rongfei Zeng
Kaiyong Zhao
S. Shi
Bingsheng He
Xiaowen Chu
33
30
0
03 Sep 2023
How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study
Alexander Isenko
R. Mayer
Hans-Arno Jacobsen
17
7
0
05 Jun 2023
Scaling Expert Language Models with Unsupervised Domain Discovery
Suchin Gururangan
Margaret Li
M. Lewis
Weijia Shi
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoE
22
46
0
24 Mar 2023
What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring
Yonadav Shavit
23
22
0
20 Mar 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
368
0
13 Mar 2023
Petals: Collaborative Inference and Fine-tuning of Large Models
Alexander Borzunov
Dmitry Baranchuk
Tim Dettmers
Max Ryabinin
Younes Belkada
Artem Chumachenko
Pavel Samygin
Colin Raffel
VLM
28
62
0
02 Sep 2022
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
Jue Wang
Binhang Yuan
Luka Rimanic
Yongjun He
Tri Dao
Beidi Chen
Christopher Ré
Ce Zhang
AI4CE
13
11
0
02 Jun 2022
Decentralized Training of Foundation Models in Heterogeneous Environments
Binhang Yuan
Yongjun He
Jared Davis
Tianyi Zhang
Tri Dao
Beidi Chen
Percy Liang
Christopher Ré
Ce Zhang
22
90
0
02 Jun 2022
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,777
0
24 Feb 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
165
413
0
18 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
253
1,989
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
231
4,469
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
1