ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.03072
  4. Cited By
TensorDIMM: A Practical Near-Memory Processing Architecture for
  Embeddings and Tensor Operations in Deep Learning

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

8 August 2019
Youngeun Kwon
Yunjae Lee
Minsoo Rhu
ArXivPDFHTML

Papers citing "TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning"

22 / 22 papers shown
Title
ElasticRec: A Microservice-based Model Serving Architecture Enabling
  Elastic Resource Scaling for Recommendation Models
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models
Yujeong Choi
Jiin Kim
Minsoo Rhu
39
1
0
11 Jun 2024
PID-Comm: A Fast and Flexible Collective Communication Framework for
  Commodity Processing-in-DIMM Devices
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
34
7
0
13 Apr 2024
ACCL+: an FPGA-Based Collective Engine for Distributed Applications
ACCL+: an FPGA-Based Collective Engine for Distributed Applications
Zhenhao He
Dario Korolija
Yu Zhu
Benjamin Ramhorst
Tristan Laan
L. Petrica
Michaela Blott
Gustavo Alonso
GNN
23
6
0
18 Dec 2023
Splitwise: Efficient generative LLM inference using phase splitting
Splitwise: Efficient generative LLM inference using phase splitting
Pratyush Patel
Esha Choukse
Chaojie Zhang
Aashaka Shah
Íñigo Goiri
Saeed Maleki
Ricardo Bianchini
52
204
0
30 Nov 2023
Instant-NeRF: Instant On-Device Neural Radiance Field Training via
  Algorithm-Accelerator Co-Designed Near-Memory Processing
Instant-NeRF: Instant On-Device Neural Radiance Field Training via Algorithm-Accelerator Co-Designed Near-Memory Processing
Yang Katie Zhao
Shang Wu
Jingqun Zhang
Sixu Li
Chaojian Li
Yingyan Lin
22
8
0
09 May 2023
On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data
On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data
D. Fox
J. M. Diaz
Xiaoming Li
6
2
0
31 Jan 2023
Failure Tolerant Training with Persistent Memory Disaggregation over CXL
Failure Tolerant Training with Persistent Memory Disaggregation over CXL
Miryeong Kwon
Junhyeok Jang
Hanjin Choi
Sangwon Lee
Myoungsoo Jung
29
8
0
14 Jan 2023
An Experimental Evaluation of Machine Learning Training on a Real
  Processing-in-Memory System
An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System
Juan Gómez Luna
Yu-Yin Guo
Sylvan Brocard
Julien Legriel
Remy Cimadomo
Geraldo F. Oliveira
Gagandeep Singh
O. Mutlu
VLM
33
15
0
16 Jul 2022
Heterogeneous Data-Centric Architectures for Modern Data-Intensive
  Applications: Case Studies in Machine Learning and Databases
Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases
Geraldo F. Oliveira
Amirali Boroumand
Saugata Ghose
Juan Gómez Luna
O. Mutlu
28
7
0
29 May 2022
SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage
  Processing Architectures
SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing Architectures
Yunjae Lee
Jin-Won Chung
Minsoo Rhu
GNN
29
48
0
10 May 2022
Training Personalized Recommendation Systems from (GPU) Scratch: Look
  Forward not Backwards
Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Youngeun Kwon
Minsoo Rhu
21
27
0
10 May 2022
Hercules: Heterogeneity-Aware Inference Serving for At-Scale
  Personalized Recommendation
Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation
Liu Ke
Udit Gupta
Mark Hempstead
Carole-Jean Wu
Hsien-Hsin S. Lee
Xuan Zhang
26
21
0
14 Mar 2022
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for
  Memory-Efficient Graph Convolutional Neural Networks
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks
Ranggi Hwang
M. Kang
Jiwon Lee
D. Kam
Youngjoo Lee
Minsoo Rhu
GNN
16
20
0
01 Mar 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network
  Accelerators
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Lois Orosa
Skanda Koppula
Yaman Umuroglu
Konstantinos Kanellopoulos
Juan Gómez Luna
Michaela Blott
K. Vissers
O. Mutlu
43
4
0
04 Feb 2022
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and
  Head Pruning
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Hanrui Wang
Zhekai Zhang
Song Han
43
377
0
17 Dec 2020
Understanding Training Efficiency of Deep Learning Recommendation Models
  at Scale
Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
Bilge Acun
Matthew Murphy
Xiaodong Wang
Jade Nie
Carole-Jean Wu
K. Hazelwood
36
109
0
11 Nov 2020
CPR: Understanding and Improving Failure Tolerant Training for Deep
  Learning Recommendation with Partial Recovery
CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Kiwan Maeng
Shivam Bharuka
Isabel Gao
M. C. Jeffrey
V. Saraph
...
Caroline Trippel
Jiyan Yang
Michael G. Rabbat
Brandon Lucia
Carole-Jean Wu
OffRL
24
31
0
05 Nov 2020
LazyBatching: An SLA-aware Batching System for Cloud Machine Learning
  Inference
LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Yujeong Choi
Yunseong Kim
Minsoo Rhu
24
66
0
25 Oct 2020
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized
  Recommendation Training
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Youngeun Kwon
Yunjae Lee
Minsoo Rhu
27
40
0
25 Oct 2020
Enabling Compute-Communication Overlap in Distributed Deep Learning
  Training Platforms
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
Saeed Rashidi
Matthew Denton
Srinivas Sridharan
Sudarshan Srinivasan
Amoghavarsha Suresh
Jade Nie
T. Krishna
26
45
0
30 Jun 2020
DeepRecSys: A System for Optimizing End-To-End At-scale Neural
  Recommendation Inference
DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference
Udit Gupta
Samuel Hsia
V. Saraph
Xiaodong Wang
Brandon Reagen
Gu-Yeon Wei
Hsien-Hsin S. Lee
David Brooks
Carole-Jean Wu
GNN
36
188
0
08 Jan 2020
The Architectural Implications of Facebook's DNN-based Personalized
  Recommendation
The Architectural Implications of Facebook's DNN-based Personalized Recommendation
Udit Gupta
Carole-Jean Wu
Xiaodong Wang
Maxim Naumov
Brandon Reagen
...
Andrey Malevich
Dheevatsa Mudigere
M. Smelyanskiy
Liang Xiong
Xuan Zhang
GNN
44
290
0
06 Jun 2019
1