Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.05861
Cited By
v1
v2 (latest)
DeepVM: Integrating Spot and On-Demand VMs for Cost-Efficient Deep Learning Clusters in the Cloud
9 March 2024
Yoochan Kim
Kihyun Kim
Yonghyeon Cho
Jinwoo Kim
Awais Khan
Ki-Dong Kang
B. An
Myung-Hoon Cha
H. Kim
Youngjae Kim
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeepVM: Integrating Spot and On-Demand VMs for Cost-Efficient Deep Learning Clusters in the Cloud"
9 / 9 papers shown
Title
Cloud Cost Optimization: A Comprehensive Review of Strategies and Case Studies
Saurabh Deochake
29
11
0
24 Jul 2023
SpotLake: Diverse Spot Instance Dataset Archive Service
Sungjae Lee
Jaeil Hwang
Kyungyong Lee
60
13
0
07 Feb 2022
Boost Neural Networks by Checkpoints
Feng Wang
Gu-Yeon Wei
Qiao Liu
Jinxiang Ou
Xian Wei
Hairong Lv
FedML
UQCV
52
10
0
03 Oct 2021
A Study of Checkpointing in Large Scale Training of Deep Neural Networks
Elvis Rojas
A. Kahira
Esteban Meneses
L. Bautista-Gomez
Rosa M. Badia
53
25
0
01 Dec 2020
PyTorch Distributed: Experiences on Accelerating Data Parallel Training
Shen Li
Yanli Zhao
R. Varma
Omkar Salpekar
P. Noordhuis
...
Adam Paszke
Jeff Smith
Brian Vaughan
Pritam Damania
Soumith Chintala
OOD
MoE
66
188
0
28 Jun 2020
Growing Together: Modeling Human Language Learning With n-Best Multi-Checkpoint Machine Translation
El Moatez Billah Nagoudi
Muhammad Abdul-Mageed
H. Cavusoglu
28
2
0
07 Jun 2020
torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models
Chiheon Kim
Heungsub Lee
Myungryong Jeong
Woonhyuk Baek
Boogeon Yoon
Ildoo Kim
Sungbin Lim
Sungwoong Kim
MoE
AI4CE
46
54
0
21 Apr 2020
Horovod: fast and easy distributed deep learning in TensorFlow
Alexander Sergeev
Mike Del Balso
102
1,222
0
15 Feb 2018
Checkpoint Ensembles: Ensemble Methods from a Single Training Process
Hugh Chen
Scott M. Lundberg
Su-In Lee
UQCV
55
63
0
09 Oct 2017
1