ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.01226
  4. Cited By
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
v1v2 (latest)

Saturn: An Optimized Data System for Large Model Deep Learning Workloads

3 September 2023
Kabir Nagrecha
Arun Kumar
ArXiv (abs)PDFHTML

Papers citing "Saturn: An Optimized Data System for Large Model Deep Learning Workloads"

50 / 53 papers shown
Title
InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep
  Recommendation Models
InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models
Kabir Nagrecha
Lingyi Liu
P. Delgado
Prasanna Padmanabhan
OffRLAI4CE
65
5
0
13 Aug 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNNVLMMoE
53
7
0
06 Jan 2023
Merlin HugeCTR: GPU-accelerated Recommender System Training and
  Inference
Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference
Zehuan Wang
Yingcan Wei
Minseok Lee
Matthias Langer
F. Yu
...
Daniel G. Abel
Xu Guo
Jianbing Dong
Ji Shi
Kunlun Li
GNNLRM
29
32
0
17 Oct 2022
Neural Architecture Search using Property Guided Synthesis
Neural Architecture Search using Property Guided Synthesis
Charles Jin
P. Phothilimthana
Sudip Roy
55
6
0
08 May 2022
Heterogeneous Acceleration Pipeline for Recommendation System Training
Heterogeneous Acceleration Pipeline for Recommendation System Training
Muhammad Adnan
Yassaman Ebrahimzadeh Maboud
Divyat Mahajan
Prashant J. Nair
68
19
0
11 Apr 2022
Pathways: Asynchronous Distributed Dataflow for ML
Pathways: Asynchronous Distributed Dataflow for ML
P. Barham
Aakanksha Chowdhery
J. Dean
Sanjay Ghemawat
Steven Hand
...
Parker Schuh
Ryan Sepassi
Laurent El Shafey
C. A. Thekkath
Yonghui Wu
GNNMoE
115
130
0
23 Mar 2022
Data Collection and Quality Challenges in Deep Learning: A Data-Centric
  AI Perspective
Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective
Steven Euijong Whang
Yuji Roh
Hwanjun Song
Jae-Gil Lee
70
341
0
13 Dec 2021
Parameter Efficient Deep Probabilistic Forecasting
Parameter Efficient Deep Probabilistic Forecasting
O. Sprangers
Sebastian Schelter
Maarten de Rijke
BDLAI4TS
95
22
0
06 Dec 2021
Sample Selection for Fair and Robust Training
Sample Selection for Fair and Robust Training
Yuji Roh
Kangwook Lee
Steven Euijong Whang
Changho Suh
59
65
0
27 Oct 2021
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
  Multi-GPU Servers
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers
Yujing Ma
Florin Rusu
Kesheng Wu
A. Sim
79
3
0
13 Oct 2021
Deep Learning on Edge TPUs
Deep Learning on Edge TPUs
A. Kist
Andreas M Kist
69
17
0
31 Aug 2021
Model-Parallel Model Selection for Deep Learning Systems
Model-Parallel Model Selection for Deep Learning Systems
Kabir Nagrecha
94
17
0
14 Jul 2021
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep
  Learning
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
83
385
0
16 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
113
697
0
09 Apr 2021
NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter
  Access
NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access
Alexander Renz-Wieland
Rainer Gemulla
Zoi Kaoudi
Volker Markl
95
16
0
01 Apr 2021
OmniFair: A Declarative System for Model-Agnostic Group Fairness in
  Machine Learning
OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning
Hantian Zhang
Xu Chu
Abolfazl Asudeh
S. Navathe
FaMLVLM
66
45
0
13 Mar 2021
Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization,
  Quantizations, Memory Optimizations, and More
Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More
Shabnam Daghaghi
Nicholas Meisburger
Mengnan Zhao
Yong Wu
Sameh Gobriel
Charlie Tai
Anshumali Shrivastava
BDLVLMMQ
28
33
0
06 Mar 2021
Semantically Constrained Memory Allocation (SCMA) for Embedding in
  Efficient Recommendation Systems
Semantically Constrained Memory Allocation (SCMA) for Embedding in Efficient Recommendation Systems
Aditya Desai
Yanzhou Pan
K. Sun
Li Chou
Anshumali Shrivastava
44
10
0
24 Feb 2021
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale
  Language Models
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
Zhuohan Li
Siyuan Zhuang
Shiyuan Guo
Danyang Zhuo
Hao Zhang
Basel Alomair
Ion Stoica
MoE
68
122
0
16 Feb 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
262
429
0
18 Jan 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
88
2,208
0
11 Jan 2021
Understanding Training Efficiency of Deep Learning Recommendation Models
  at Scale
Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
Bilge Acun
Matthew Murphy
Xiaodong Wang
Jade Nie
Carole-Jean Wu
K. Hazelwood
75
112
0
11 Nov 2020
Scheduling Real-time Deep Learning Services as Imprecise Computations
Scheduling Real-time Deep Learning Services as Imprecise Computations
Shuochao Yao
Yifan Hao
Yiran Zhao
Huajie Shao
Dongxin Liu
Shengzhong Liu
Tianshi Wang
Jinyang Li
Tarek Abdelzaher
98
36
0
02 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
667
41,369
0
22 Oct 2020
Transferable Graph Optimizers for ML Compilers
Transferable Graph Optimizers for ML Compilers
Yanqi Zhou
Sudip Roy
AmirAli Abdolrashidi
Daniel Wong
Peter C. Ma
...
Mangpo Phitchaya Phothilimtha
Shen Wang
Anna Goldie
Azalia Mirhoseini
James Laudon
GNN
53
55
0
21 Oct 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep
  Learning
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
59
181
0
27 Aug 2020
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning
  Workloads
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
Deepak Narayanan
Keshav Santhanam
Fiodar Kazhamiaka
Amar Phanishayee
Matei A. Zaharia
58
209
0
20 Aug 2020
Serving DNNs like Clockwork: Performance Predictability from the Bottom
  Up
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up
A. Gujarati
Reza Karimi
Safya Alzayat
Wei Hao
Antoine Kaufmann
Ymir Vigfusson
Jonathan Mace
85
280
0
03 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
826
42,332
0
28 May 2020
torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models
torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models
Chiheon Kim
Heungsub Lee
Myungryong Jeong
Woonhyuk Baek
Boogeon Yoon
Ildoo Kim
Sungbin Lim
Sungwoong Kim
MoEAI4CE
46
54
0
21 Apr 2020
PipeMare: Asynchronous Pipeline Parallel DNN Training
PipeMare: Asynchronous Pipeline Parallel DNN Training
Bowen Yang
Jian Zhang
Jonathan Li
Christopher Ré
Christopher R. Aberger
Christopher De Sa
68
111
0
09 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
234
7,520
0
02 Oct 2019
Elastic deep learning in multi-tenant GPU cluster
Elastic deep learning in multi-tenant GPU cluster
Yidi Wu
Kaihao Ma
Xiao Yan
Zhi Liu
Zhenkun Cai
Yuzhen Huang
James Cheng
Han Yuan
Fan Yu
10
2
0
26 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
331
1,914
0
17 Sep 2019
DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters
Size Zheng
Yixin Bao
Yangrui Chen
Chuan Wu
Chen Meng
Wei Lin
40
82
0
13 Sep 2019
Themis: Fair and Efficient GPU Cluster Scheduling
Themis: Fair and Efficient GPU Cluster Scheduling
Kshiteej S. Mahajan
Arjun Balasubramanian
Arjun Singhvi
Shivaram Venkataraman
Aditya Akella
Amar Phanishayee
Shuchi Chawla
54
181
0
02 Jul 2019
Deep Learning Recommendation Model for Personalization and
  Recommendation Systems
Deep Learning Recommendation Model for Personalization and Recommendation Systems
Maxim Naumov
Dheevatsa Mudigere
Hao-Jun Michael Shi
Jianyu Huang
Narayanan Sundaraman
...
Wenlin Chen
Vijay Rao
Bill Jia
Liang Xiong
M. Smelyanskiy
91
733
0
31 May 2019
Low-Memory Neural Network Training: A Technical Report
Low-Memory Neural Network Training: A Technical Report
N. Sohoni
Christopher R. Aberger
Megan Leszczynski
Jian Zhang
Christopher Ré
50
102
0
24 Apr 2019
Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training
  Workloads
Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
Myeongjae Jeon
Shivaram Venkataraman
Amar Phanishayee
Junjie Qian
Wencong Xiao
Fan Yang
GNN
65
352
0
17 Jan 2019
A System for Massively Parallel Hyperparameter Tuning
A System for Massively Parallel Hyperparameter Tuning
Liam Li
Kevin Jamieson
Afshin Rostamizadeh
Ekaterina Gonina
Moritz Hardt
Benjamin Recht
Ameet Talwalkar
68
385
0
13 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,114
0
11 Oct 2018
Beyond Data and Model Parallelism for Deep Neural Networks
Beyond Data and Model Parallelism for Deep Neural Networks
Zhihao Jia
Matei A. Zaharia
A. Aiken
GNNAI4CE
64
505
0
14 Jul 2018
TFLMS: Large Model Support in TensorFlow by Graph Rewriting
TFLMS: Large Model Support in TensorFlow by Graph Rewriting
Tung D. Le
Haruki Imai
Yasushi Negishi
K. Kawachiya
GNN
76
47
0
05 Jul 2018
PipeDream: Fast and Efficient Pipeline Parallel DNN Training
PipeDream: Fast and Efficient Pipeline Parallel DNN Training
A. Harlap
Deepak Narayanan
Amar Phanishayee
Vivek Seshadri
Nikhil R. Devanur
G. Ganger
Phillip B. Gibbons
AI4CE
61
254
0
08 Jun 2018
Horovod: fast and easy distributed deep learning in TensorFlow
Horovod: fast and easy distributed deep learning in TensorFlow
Alexander Sergeev
Mike Del Balso
100
1,221
0
15 Feb 2018
SLAQ: Quality-Driven Scheduling for Distributed Machine Learning
SLAQ: Quality-Driven Scheduling for Distributed Machine Learning
Haoyu Zhang
Logan Stafman
Andrew Or
M. Freedman
52
140
0
13 Feb 2018
Tensor Comprehensions: Framework-Agnostic High-Performance Machine
  Learning Abstractions
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Nicolas Vasilache
O. Zinenko
Theodoros Theodoridis
Priya Goyal
Zach DeVito
William S. Moses
Sven Verdoolaege
Andrew Adams
Albert Cohen
74
436
0
13 Feb 2018
Online Job Scheduling in Distributed Machine Learning Clusters
Online Job Scheduling in Distributed Machine Learning Clusters
Yixin Bao
Size Zheng
Chuan Wu
Zongpeng Li
59
110
0
03 Jan 2018
Ray: A Distributed Framework for Emerging AI Applications
Ray: A Distributed Framework for Emerging AI Applications
Philipp Moritz
Robert Nishihara
Stephanie Wang
Alexey Tumanov
Richard Liaw
...
Melih Elibol
Zongheng Yang
William Paul
Michael I. Jordan
Ion Stoica
GNN
105
1,266
0
16 Dec 2017
Poseidon: An Efficient Communication Architecture for Distributed Deep
  Learning on GPU Clusters
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Huatian Zhang
Zeyu Zheng
Shizhen Xu
Wei-Ming Dai
Qirong Ho
Xiaodan Liang
Zhiting Hu
Jinliang Wei
P. Xie
Eric Xing
GNN
67
347
0
11 Jun 2017
12
Next