Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.12127
Cited By
tf.data: A Machine Learning Data Processing Framework
28 January 2021
D. Murray
Jiří Šimša
Ana Klimovic
Ihor Indyk
PINN
AI4CE
LMTD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"tf.data: A Machine Learning Data Processing Framework"
30 / 30 papers shown
Title
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
Juntao Zhao
Qi Lu
Wei Jia
Borui Wan
Lei Zuo
...
Size Zheng
H. Lin
Haibin Lin
Xin Liu
Chuan Wu
AI4CE
37
0
0
14 Apr 2025
Mixtera: A Data Plane for Foundation Model Training
Maximilian Böther
Xiaozhe Yao
Tolga Kerimoglu
Ana Klimovic
Viktor Gsteiger
Ana Klimovic
MoE
101
0
0
27 Feb 2025
Data Analysis Prediction over Multiple Unseen Datasets: A Vector Embedding Approach
Andreas Loizou
Dimitrios Tsoumakos
43
0
0
24 Feb 2025
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization
Haoyang Li
Fangcheng Fu
Hao Ge
Sheng Lin
Xuanyu Wang
Jiawen Niu
Yijiao Wang
Hailin Zhang
Xiaonan Nie
Bin Cui
MoMe
41
2
0
17 Oct 2024
TensorSocket: Shared Data Loading for Deep Learning Training
Ties Robroek
Neil Kim Nielsen
Pınar Tözün
28
2
0
27 Sep 2024
Efficient Tabular Data Preprocessing of ML Pipelines
Yu Zhu
Wenqi Jiang
Gustavo Alonso
LMTD
27
1
0
23 Sep 2024
AI-coupled HPC Workflow Applications, Middleware and Performance
Wes Brewer
Ana Gainaru
Frédéric Suter
Feiyi Wang
M. Emani
S. Jha
30
10
0
20 Jun 2024
PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models
Yunjae Lee
Hyeseong Kim
Minsoo Rhu
42
3
0
11 Jun 2024
KerasCV and KerasNLP: Vision and Language Power-Ups
Matthew Watson
Divyashree Shivakumar Sreepathihalli
François Chollet
Martin Gorner
Kiranbir Sodhia
...
Chen Qian
Jonathan Bischof
Ian Stenbit
Abheesht Sharma
Anshuman Mishra
CLIP
VLM
27
1
0
30 May 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
57
0
0
16 Apr 2024
Bullion: A Column Store for Machine Learning
Gang Liao
Ye Liu
Jianjun Chen
Daniel J. Abadi
37
5
0
13 Apr 2024
Characterization of Large Language Model Development in the Datacenter
Qi Hu
Zhisheng Ye
Zerui Wang
Guoteng Wang
Mengdie Zhang
...
Dahua Lin
Xiaolin Wang
Yingwei Luo
Yonggang Wen
Tianwei Zhang
56
43
0
12 Mar 2024
InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models
Kabir Nagrecha
Lingyi Liu
P. Delgado
Prasanna Padmanabhan
OffRL
AI4CE
33
5
0
13 Aug 2023
Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning
Zachary B. Charles
Nicole Mitchell
Krishna Pillutla
Michael Reneer
Zachary Garrett
FedML
AI4CE
38
28
0
18 Jul 2023
FFCV: Accelerating Training by Removing Data Bottlenecks
Guillaume Leclerc
Andrew Ilyas
Logan Engstrom
Sung Min Park
Hadi Salman
A. Madry
29
67
0
21 Jun 2023
tf.data service: A Case for Disaggregating ML Input Data Processing
Andrew Audibert
Yangrui Chen
D. Graur
Ana Klimovic
Jiří Šimša
C. A. Thekkath
44
16
0
26 Oct 2022
Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores
Arsany Guirguis
Diana Petrescu
Florin Dinu
D. Quoc
Javier Picorel
R. Guerraoui
40
0
0
16 Oct 2022
L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training
Jonghyun Bae
W. Baek
Tae Jun Ham
Jae W. Lee
25
1
0
18 Aug 2022
DataPerf: Benchmarks for Data-Centric AI Development
Mark Mazumder
Colby R. Banbury
Xiaozhe Yao
Bojan Karlavs
W. G. Rojas
...
Carole-Jean Wu
Cody Coleman
Andrew Y. Ng
Peter Mattson
Vijay Janapa Reddi
VLM
43
102
0
20 Jul 2022
A Machine Learning Data Fusion Model for Soil Moisture Retrieval
Vishal Batchu
G. Nearing
Varun Gulshan
AI4Cl
MDE
27
2
0
20 Jun 2022
End-to-end Optimization of Machine Learning Prediction Queries
Kwanghyun Park
Karla Saur
Dalitso Banda
Rathijit Sen
Matteo Interlandi
Konstantinos Karanasos
8
41
0
31 May 2022
Exoshuffle: An Extensible Shuffle Architecture
Frank Sifei Luan
Stephanie Wang
Samyukta Yagati
Sean Kim
Kenneth Lien
Isaac Ong
Tony Hong
S. Cho
Eric Liang
Ion Stoica
9
6
0
09 Mar 2022
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
Alexander Isenko
R. Mayer
Jeffrey Jedele
Hans-Arno Jacobsen
19
23
0
17 Feb 2022
Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines
Michael Kuchnik
Ana Klimovic
Jiří Šimša
Virginia Smith
George Amvrosiadis
56
30
0
07 Nov 2021
Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters
Jayashree Mohan
Amar Phanishayee
Janardhan Kulkarni
Vijay Chidambaram
GNN
8
3
0
12 Oct 2021
Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Mark Zhao
Niket Agarwal
Aarti Basant
B. Gedik
Satadru Pan
...
Kevin Wilfong
Harsha Rastogi
Carole-Jean Wu
Christos Kozyrakis
Parikshit Pol
GNN
34
70
0
20 Aug 2021
Clairvoyant Prefetching for Distributed Machine Learning I/O
Nikoli Dryden
Roman Böhringer
Tal Ben-Nun
Torsten Hoefler
31
55
0
21 Jan 2021
Progressive Compressed Records: Taking a Byte out of Deep Learning Data
Michael Kuchnik
George Amvrosiadis
Virginia Smith
11
9
0
01 Nov 2019
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
271
5,327
0
05 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1