Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.04680
Cited By
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
10 May 2020
Dhiraj D. Kalamkar
E. Georganas
Sudarshan Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures"
21 / 21 papers shown
Title
PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences
Pingyi Huo
Anusha Devulapally
Hasan Al Maruf
Minseo Park
Krishnakumar Nair
Meena Arunachalam
Gulsum Gudukbay Akbulut
M. Kandemir
Vijaykrishnan Narayanan
34
0
0
25 Sep 2024
Efficient Tabular Data Preprocessing of ML Pipelines
Yu Zhu
Wenqi Jiang
Gustavo Alonso
LMTD
27
1
0
23 Sep 2024
Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings
Yassaman Ebrahimzadeh Maboud
Muhammad Adnan
Divyat Mahajan
Prashant J. Nair
AI4TS
40
0
0
22 Mar 2024
[Experiments & Analysis] Evaluating the Feasibility of Sampling-Based Techniques for Training Multilayer Perceptrons
Sana Ebrahimi
Rishi Advani
Abolfazl Asudeh
19
0
0
15 Jun 2023
Mem-Rec: Memory Efficient Recommendation System using Alternative Representation
Gopu Krishna Jha
Anthony Thomas
Nilesh Jain
Sameh Gobriel
Tajana Rosing
Ravi Iyer
53
2
0
12 May 2023
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Yujeong Choi
John Kim
Minsoo Rhu
21
1
0
23 Feb 2023
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
22
11
0
12 Oct 2022
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Baolin Li
Rohan Basu Roy
Tirthak Patel
V. Gadepally
K. Gettings
Devesh Tiwari
32
25
0
23 Jul 2022
The trade-offs of model size in large recommendation models : A 10000
×
\times
×
compressed criteo-tb DLRM model (100 GB parameters to mere 10MB)
Aditya Desai
Anshumali Shrivastava
AI4CE
20
3
0
21 Jul 2022
FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure
Y. F. Arthanto
David Ojika
Joo-Young Kim
FedML
58
2
0
11 Jul 2022
Deep Learning Models on CPUs: A Methodology for Efficient Training
Quchen Fu
Ramesh Chukka
Keith Achorn
Thomas Atta-fosu
Deepak R. Canchi
Zhongwei Teng
Jules White
Douglas C. Schmidt
21
1
0
20 Jun 2022
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Rui Ma
E. Georganas
A. Heinecke
Andrew Boutros
Eriko Nurvitadhi
GNN
24
12
0
22 Apr 2022
Heterogeneous Acceleration Pipeline for Recommendation System Training
Muhammad Adnan
Yassaman Ebrahimzadeh Maboud
Divyat Mahajan
Prashant J. Nair
31
18
0
11 Apr 2022
Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Zhongyi Lin
Louis Feng
E. K. Ardestani
Jaewon Lee
J. Lundell
Changkyu Kim
A. Kejariwal
John Douglas Owens
22
19
0
19 Jan 2022
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads
E. Georganas
Dhiraj D. Kalamkar
Sasikanth Avancha
Menachem Adelman
Deepti Aggarwal
...
Ramanarayan Mohanty
Hans Pabst
Brian Retford
Barukh Ziv
A. Heinecke
26
17
0
12 Apr 2021
Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models
Dheevatsa Mudigere
Y. Hao
Jianyu Huang
Zhihao Jia
Andrew Tulloch
...
Ajit Mathews
Lin Qiao
M. Smelyanskiy
Bill Jia
Vijay Rao
34
149
0
12 Apr 2021
ECRM: Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding
Kaige Liu
J. Kosaian
K. V. Rashmi
32
4
0
05 Apr 2021
CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Kiwan Maeng
Shivam Bharuka
Isabel Gao
M. C. Jeffrey
V. Saraph
...
Caroline Trippel
Jiyan Yang
Michael G. Rabbat
Brandon Lucia
Carole-Jean Wu
OffRL
24
31
0
05 Nov 2020
Cross-Stack Workload Characterization of Deep Recommendation Systems
Samuel Hsia
Udit Gupta
Mark Wilkening
Carole-Jean Wu
Gu-Yeon Wei
David Brooks
BDL
GNN
HAI
25
32
0
10 Oct 2020
Accelerating Recommender Systems via Hardware "scale-in"
S. Krishna
Ravi Krishna
GNN
LRM
24
6
0
11 Sep 2020
Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems
Maxim Naumov
John Kim
Dheevatsa Mudigere
Srinivas Sridharan
Xiaodong Wang
...
Krishnakumar Nair
Isabel Gao
Bor-Yiing Su
Jiyan Yang
M. Smelyanskiy
GNN
46
95
0
20 Mar 2020
1