Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.04007
Cited By
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
7 November 2021
Sanjith Athlur
Nitika Saran
Muthian Sivathanu
Ramachandran Ramjee
Nipun Kwatra
GNN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Varuna: Scalable, Low-cost Training of Massive Deep Learning Models"
13 / 13 papers shown
Title
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
Seungbeom Choi
Jeonghoe Goo
Eunjoo Jeon
Mingyu Yang
Minsung Jang
21
0
0
14 May 2025
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training
Yijie Zheng
Bangjun Xiao
Lei Shi
Xiaoyang Li
Faming Wu
Tianyu Li
Xuefeng Xiao
Wenjie Qu
Yansen Wang
Shouda Liu
MLLM
MoE
75
1
0
31 Mar 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
253
0
0
08 Jan 2025
SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Ziming Mao
Tian Xia
Zhanghao Wu
Wei-Lin Chiang
Tyler Griggs
Romil Bhardwaj
Zongheng Yang
S. Shenker
Ion Stoica
62
2
0
03 Nov 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
39
6
0
09 Apr 2024
Unicron: Economizing Self-Healing LLM Training at Scale
Tao He
Xue Li
Zhibin Wang
Kun Qian
Jingbo Xu
Wenyuan Yu
Jingren Zhou
27
15
0
30 Dec 2023
EasyScale: Accuracy-consistent Elastic Training for Deep Learning
Mingzhen Li
Wencong Xiao
Biao Sun
Hanyu Zhao
Hailong Yang
...
Xianyan Jia
Yi Liu
Yong Li
Wei Lin
D. Qian
22
7
0
30 Aug 2022
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Zhiquan Lai
Shengwei Li
Xudong Tang
Ke-shi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
35
41
0
10 Jun 2022
Decentralized Training of Foundation Models in Heterogeneous Environments
Binhang Yuan
Yongjun He
Jared Davis
Tianyi Zhang
Tri Dao
Beidi Chen
Percy Liang
Christopher Ré
Ce Zhang
38
91
0
02 Jun 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
21
39
0
30 Apr 2022
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan Oseledets
Olivier Beaumont
27
10
0
21 Feb 2022
Compute Trends Across Three Eras of Machine Learning
J. Sevilla
Lennart Heim
A. Ho
T. Besiroglu
Marius Hobbhahn
Pablo Villalobos
39
272
0
11 Feb 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
1