Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.03072
Cited By
TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
8 August 2019
Youngeun Kwon
Yunjae Lee
Minsoo Rhu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning"
22 / 22 papers shown
Title
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models
Yujeong Choi
Jiin Kim
Minsoo Rhu
39
1
0
11 Jun 2024
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
34
7
0
13 Apr 2024
ACCL+: an FPGA-Based Collective Engine for Distributed Applications
Zhenhao He
Dario Korolija
Yu Zhu
Benjamin Ramhorst
Tristan Laan
L. Petrica
Michaela Blott
Gustavo Alonso
GNN
23
6
0
18 Dec 2023
Splitwise: Efficient generative LLM inference using phase splitting
Pratyush Patel
Esha Choukse
Chaojie Zhang
Aashaka Shah
Íñigo Goiri
Saeed Maleki
Ricardo Bianchini
52
204
0
30 Nov 2023
Instant-NeRF: Instant On-Device Neural Radiance Field Training via Algorithm-Accelerator Co-Designed Near-Memory Processing
Yang Katie Zhao
Shang Wu
Jingqun Zhang
Sixu Li
Chaojian Li
Yingyan Lin
22
8
0
09 May 2023
On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data
D. Fox
J. M. Diaz
Xiaoming Li
6
2
0
31 Jan 2023
Failure Tolerant Training with Persistent Memory Disaggregation over CXL
Miryeong Kwon
Junhyeok Jang
Hanjin Choi
Sangwon Lee
Myoungsoo Jung
29
8
0
14 Jan 2023
An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System
Juan Gómez Luna
Yu-Yin Guo
Sylvan Brocard
Julien Legriel
Remy Cimadomo
Geraldo F. Oliveira
Gagandeep Singh
O. Mutlu
VLM
33
15
0
16 Jul 2022
Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases
Geraldo F. Oliveira
Amirali Boroumand
Saugata Ghose
Juan Gómez Luna
O. Mutlu
28
7
0
29 May 2022
SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing Architectures
Yunjae Lee
Jin-Won Chung
Minsoo Rhu
GNN
29
48
0
10 May 2022
Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Youngeun Kwon
Minsoo Rhu
21
27
0
10 May 2022
Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation
Liu Ke
Udit Gupta
Mark Hempstead
Carole-Jean Wu
Hsien-Hsin S. Lee
Xuan Zhang
26
21
0
14 Mar 2022
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks
Ranggi Hwang
M. Kang
Jiwon Lee
D. Kam
Youngjoo Lee
Minsoo Rhu
GNN
16
20
0
01 Mar 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Lois Orosa
Skanda Koppula
Yaman Umuroglu
Konstantinos Kanellopoulos
Juan Gómez Luna
Michaela Blott
K. Vissers
O. Mutlu
43
4
0
04 Feb 2022
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Hanrui Wang
Zhekai Zhang
Song Han
43
377
0
17 Dec 2020
Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
Bilge Acun
Matthew Murphy
Xiaodong Wang
Jade Nie
Carole-Jean Wu
K. Hazelwood
36
109
0
11 Nov 2020
CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Kiwan Maeng
Shivam Bharuka
Isabel Gao
M. C. Jeffrey
V. Saraph
...
Caroline Trippel
Jiyan Yang
Michael G. Rabbat
Brandon Lucia
Carole-Jean Wu
OffRL
24
31
0
05 Nov 2020
LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Yujeong Choi
Yunseong Kim
Minsoo Rhu
24
66
0
25 Oct 2020
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Youngeun Kwon
Yunjae Lee
Minsoo Rhu
27
40
0
25 Oct 2020
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
Saeed Rashidi
Matthew Denton
Srinivas Sridharan
Sudarshan Srinivasan
Amoghavarsha Suresh
Jade Nie
T. Krishna
26
45
0
30 Jun 2020
DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference
Udit Gupta
Samuel Hsia
V. Saraph
Xiaodong Wang
Brandon Reagen
Gu-Yeon Wei
Hsien-Hsin S. Lee
David Brooks
Carole-Jean Wu
GNN
36
188
0
08 Jan 2020
The Architectural Implications of Facebook's DNN-based Personalized Recommendation
Udit Gupta
Carole-Jean Wu
Xiaodong Wang
Maxim Naumov
Brandon Reagen
...
Andrey Malevich
Dheevatsa Mudigere
M. Smelyanskiy
Liang Xiong
Xuan Zhang
GNN
44
290
0
06 Jun 2019
1