ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.02772
  4. Cited By
DeepRecSys: A System for Optimizing End-To-End At-scale Neural
  Recommendation Inference

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

8 January 2020
Udit Gupta
Samuel Hsia
V. Saraph
Xiaodong Wang
Brandon Reagen
Gu-Yeon Wei
Hsien-Hsin S. Lee
David Brooks
Carole-Jean Wu
    GNN
ArXivPDFHTML

Papers citing "DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference"

50 / 75 papers shown
Title
SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models
SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models
Jinho Yang
Ji-Hoon Kim
Joo-Young Kim
44
0
0
01 Apr 2025
Palermo: Improving the Performance of Oblivious Memory using
  Protocol-Hardware Co-Design
Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design
Haojie Ye
Yuchen Xia
Yuhan Chen
Kuan-Yu Chen
Yichao Yuan
Shuwen Deng
Baris Kasikci
T. Mudge
Nishil Talati
23
0
0
08 Nov 2024
Pushing the Performance Envelope of DNN-based Recommendation Systems
  Inference on GPUs
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs
Rishabh Jain
Vivek M. Bhasi
Adwait Jog
A. Sivasubramaniam
M. Kandemir
Chita R. Das
33
2
0
29 Oct 2024
A House United Within Itself: SLO-Awareness for On-Premises
  Containerized ML Inference Clusters via Faro
A House United Within Itself: SLO-Awareness for On-Premises Containerized ML Inference Clusters via Faro
Beomyeol Jeon
Chen Wang
Diana Arroyo
Alaa Youssef
Indranil Gupta
37
0
0
29 Sep 2024
PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System
  Inferences
PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences
Pingyi Huo
Anusha Devulapally
Hasan Al Maruf
Minseo Park
Krishnakumar Nair
Meena Arunachalam
Gulsum Gudukbay Akbulut
M. Kandemir
Vijaykrishnan Narayanan
34
0
0
25 Sep 2024
Efficient Tabular Data Preprocessing of ML Pipelines
Efficient Tabular Data Preprocessing of ML Pipelines
Yu Zhu
Wenqi Jiang
Gustavo Alonso
LMTD
25
1
0
23 Sep 2024
CADC: Encoding User-Item Interactions for Compressing Recommendation
  Model Training Data
CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data
Hossein Entezari Zarch
Abdulla Alshabanah
Chaoyi Jiang
Murali Annavaram
25
1
0
11 Jul 2024
AnnotatedTables: A Large Tabular Dataset with Language Model Annotations
AnnotatedTables: A Large Tabular Dataset with Language Model Annotations
Yaojie Hu
Ilias Fountalis
Jin Tian
N. Vasiloglou
LMTD
36
3
0
24 Jun 2024
PreSto: An In-Storage Data Preprocessing System for Training
  Recommendation Models
PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models
Yunjae Lee
Hyeseong Kim
Minsoo Rhu
42
3
0
11 Jun 2024
ElasticRec: A Microservice-based Model Serving Architecture Enabling
  Elastic Resource Scaling for Recommendation Models
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models
Yujeong Choi
Jiin Kim
Minsoo Rhu
39
1
0
11 Jun 2024
Carbon Connect: An Ecosystem for Sustainable Computing
Carbon Connect: An Ecosystem for Sustainable Computing
Benjamin C. Lee
David Brooks
Arthur van Benthem
Udit Gupta
G. Hills
...
Emma Strubell
Gu-Yeon Wei
Adam Wierman
Yuan Yao
Minlan Yu
20
2
0
22 May 2024
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text
  Streaming Services
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services
Jiachen Liu
Zhiyu Wu
Jae-Won Chung
Fan Lai
Myungjin Lee
Mosharaf Chowdhury
48
25
0
25 Apr 2024
LazyDP: Co-Designing Algorithm-Software for Scalable Training of
  Differentially Private Recommendation Models
LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models
Juntaek Lim
Youngeun Kwon
Ranggi Hwang
Kiwan Maeng
Edward Suh
Minsoo Rhu
SyDa
31
0
0
12 Apr 2024
ACCL+: an FPGA-Based Collective Engine for Distributed Applications
ACCL+: an FPGA-Based Collective Engine for Distributed Applications
Zhenhao He
Dario Korolija
Yu Zhu
Benjamin Ramhorst
Tristan Laan
L. Petrica
Michaela Blott
Gustavo Alonso
GNN
21
6
0
18 Dec 2023
CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale
  Recommendation Models
CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models
Hailin Zhang
Zirui Liu
Boxuan Chen
Yikai Zhao
Tong Zhao
Tong Yang
Tengjiao Wang
37
10
0
06 Dec 2023
Splitwise: Efficient generative LLM inference using phase splitting
Splitwise: Efficient generative LLM inference using phase splitting
Pratyush Patel
Esha Choukse
Chaojie Zhang
Aashaka Shah
Íñigo Goiri
Saeed Maleki
Ricardo Bianchini
49
197
0
30 Nov 2023
ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware
  Approach
ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware Approach
Yuke Hu
Jian Lou
Jiaqi Liu
Wangze Ni
Feng Lin
Zhan Qin
Kui Ren
MU
40
12
0
03 Nov 2023
Serving Deep Learning Model in Relational Databases
Serving Deep Learning Model in Relational Databases
Alexandre Eichenberger
Qi Lin
Saif Masood
Hong Min
Alexander Sim
...
Yida Wang
Kesheng Wu
Binhang Yuan
Lixi Zhou
Jia Zou
21
0
0
07 Oct 2023
MAD Max Beyond Single-Node: Enabling Large Machine Learning Model
  Acceleration on Distributed Systems
MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems
Samuel Hsia
Alicia Golden
Bilge Acun
Newsha Ardalani
Zach DeVito
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
MoE
47
9
0
04 Oct 2023
Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?
Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?
Seyed Morteza Nabavinejad
M. Ebrahimi
Sherief Reda
24
1
0
26 Aug 2023
Opportunities of Renewable Energy Powered DNN Inference
Opportunities of Renewable Energy Powered DNN Inference
Seyed Morteza Nabavinejad
Tian Guo
AI4CE
24
2
0
21 Jun 2023
S$^{3}$: Increasing GPU Utilization during Generative Inference for
  Higher Throughput
S3^{3}3: Increasing GPU Utilization during Generative Inference for Higher Throughput
Yunho Jin
Chun-Feng Wu
David Brooks
Gu-Yeon Wei
29
62
0
09 Jun 2023
Mem-Rec: Memory Efficient Recommendation System using Alternative
  Representation
Mem-Rec: Memory Efficient Recommendation System using Alternative Representation
Gopu Krishna Jha
Anthony Thomas
Nilesh Jain
Sameh Gobriel
Tajana Rosing
Ravi Iyer
53
2
0
12 May 2023
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning
  Inference Service
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
28
27
0
19 Apr 2023
Reclaimer: A Reinforcement Learning Approach to Dynamic Resource
  Allocation for Cloud Microservices
Reclaimer: A Reinforcement Learning Approach to Dynamic Resource Allocation for Cloud Microservices
Quintin Fettes
Avinash Karanth
Razvan Bunescu
Brandon Beckwith
S. Subramoney
22
3
0
17 Apr 2023
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for
  Personalized Recommendations
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Yujeong Choi
John Kim
Minsoo Rhu
18
1
0
23 Feb 2023
MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation
MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation
Samuel Hsia
Udit Gupta
Bilge Acun
Newsha Ardalani
Pan Zhong
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
46
17
0
21 Feb 2023
Computation vs. Communication Scaling for Future Transformers on Future
  Hardware
Computation vs. Communication Scaling for Future Transformers on Future Hardware
Suchita Pati
Shaizeen Aga
Mahzabeen Islam
Nuwan Jayasena
Matthew D. Sinclair
25
9
0
06 Feb 2023
GPU-based Private Information Retrieval for On-Device Machine Learning
  Inference
GPU-based Private Information Retrieval for On-Device Machine Learning Inference
Maximilian Lam
Jeff Johnson
Wenjie Xiong
Kiwan Maeng
Udit Gupta
...
Hsien-Hsin S. Lee
Vijay Janapa Reddi
Gu-Yeon Wei
David Brooks
Edward Suh
32
9
0
26 Jan 2023
Failure Tolerant Training with Persistent Memory Disaggregation over CXL
Failure Tolerant Training with Persistent Memory Disaggregation over CXL
Miryeong Kwon
Junhyeok Jang
Hanjin Choi
Sangwon Lee
Myoungsoo Jung
26
8
0
14 Jan 2023
FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation
  Models
FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation Models
Geet Sethi
Pallab Bhattacharya
Dhruv Choudhary
Carole-Jean Wu
Christos Kozyrakis
21
5
0
08 Jan 2023
DisaggRec: Architecting Disaggregated Systems for Large-Scale
  Personalized Recommendation
DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation
Liu Ke
Xuan Zhang
Benjamin C. Lee
G. E. Suh
Hsien-Hsin S. Lee
43
8
0
02 Dec 2022
A GPU-specialized Inference Parameter Server for Large-Scale Deep
  Recommendation Models
A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models
Yingcan Wei
Matthias Langer
F. Yu
Minseok Lee
Kingsley Liu
Ji Shi
Zehuan Wang
BDL
24
17
0
17 Oct 2022
Merlin HugeCTR: GPU-accelerated Recommender System Training and
  Inference
Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference
Zehuan Wang
Yingcan Wei
Minseok Lee
Matthias Langer
F. Yu
...
Daniel G. Abel
Xu Guo
Jianbing Dong
Ji Shi
Kunlun Li
GNN
LRM
25
32
0
17 Oct 2022
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with
  Heterogeneous Cloud Resources
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
22
11
0
12 Oct 2022
A Comprehensive Survey on Trustworthy Recommender Systems
A Comprehensive Survey on Trustworthy Recommender Systems
Wenqi Fan
Xiangyu Zhao
Xiao Chen
Jingran Su
Jingtong Gao
...
Qidong Liu
Yiqi Wang
Hanfeng Xu
Lei Chen
Qing Li
FaML
43
46
0
21 Sep 2022
Demystifying Arch-hints for Model Extraction: An Attack in Unified
  Memory System
Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System
Zhendong Wang
Xiaoming Zeng
Xulong Tang
Danfeng Zhang
Xingbo Hu
Yang Hu
AAML
MIACV
FedML
32
6
0
29 Aug 2022
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using
  a Diverse Pool of Cloud Computing Instances
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Baolin Li
Rohan Basu Roy
Tirthak Patel
V. Gadepally
K. Gettings
Devesh Tiwari
32
25
0
23 Jul 2022
Characterizing and Optimizing End-to-End Systems for Private Inference
Characterizing and Optimizing End-to-End Systems for Private Inference
Karthik Garimella
Zahra Ghodsi
N. Jha
S. Garg
Brandon Reagen
44
25
0
14 Jul 2022
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy,
  Challenges and Vision
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision
Wei Gao
Qi Hu
Zhisheng Ye
Peng Sun
Xiaolin Wang
Yingwei Luo
Tianwei Zhang
Yonggang Wen
86
26
0
24 May 2022
Training Personalized Recommendation Systems from (GPU) Scratch: Look
  Forward not Backwards
Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Youngeun Kwon
Minsoo Rhu
21
27
0
10 May 2022
Dynamic Network Adaptation at Inference
Dynamic Network Adaptation at Inference
Daniel Mendoza
Caroline Trippel
32
0
0
18 Apr 2022
Heterogeneous Acceleration Pipeline for Recommendation System Training
Heterogeneous Acceleration Pipeline for Recommendation System Training
Muhammad Adnan
Yassaman Ebrahimzadeh Maboud
Divyat Mahajan
Prashant J. Nair
28
18
0
11 Apr 2022
Learning to Collide: Recommendation System Model Compression with
  Learned Hash Functions
Learning to Collide: Recommendation System Model Compression with Learned Hash Functions
Benjamin Ghaemmaghami
Mustafa Ozdal
Rakesh Komuravelli
D. Korchev
Dheevatsa Mudigere
Krishnakumar Nair
Maxim Naumov
34
6
0
28 Mar 2022
ORCA: A Network and Architecture Co-design for Offloading us-scale
  Datacenter Applications
ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications
Yifan Yuan
Jing-yu Huang
Yan Sun
Tianchen Wang
Jacob Nelson
Dan R. K. Ports
Yipeng Wang
Ren Wang
Charlie Tai
N. Kim
34
2
0
16 Mar 2022
Hercules: Heterogeneity-Aware Inference Serving for At-Scale
  Personalized Recommendation
Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation
Liu Ke
Udit Gupta
Mark Hempstead
Carole-Jean Wu
Hsien-Hsin S. Lee
Xuan Zhang
26
21
0
14 Mar 2022
PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable
  Multi-GPU Inference Servers
PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Yunseong Kim
Yujeong Choi
Minsoo Rhu
20
15
0
27 Feb 2022
BagPipe: Accelerating Deep Recommendation Model Training
BagPipe: Accelerating Deep Recommendation Model Training
Saurabh Agarwal
Chengpo Yan
Ziyi Zhang
Shivaram Venkataraman
31
17
0
24 Feb 2022
RecShard: Statistical Feature-Based Memory Optimization for
  Industry-Scale Neural Recommendation
RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation
Geet Sethi
Bilge Acun
Niket Agarwal
Christos Kozyrakis
Caroline Trippel
Carole-Jean Wu
47
66
0
25 Jan 2022
SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real
  Processing-In-Memory Systems
SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems
Christina Giannoula
Ivan Fernandez
Juan Gómez Luna
N. Koziris
G. Goumas
O. Mutlu
MoE
18
26
0
13 Jan 2022
12
Next