Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.04481
Cited By
SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads
10 December 2019
S. Xi
Yuan Yao
K. Bhardwaj
P. Whatmough
Gu-Yeon Wei
David Brooks
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads"
23 / 23 papers shown
Title
Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference
Zhi-Gang Liu
P. Whatmough
Matthew Mattina
17
81
0
16 May 2020
ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems
Patrick Hansen
Alexey Vilkin
Yu. V. Khrustalev
J. Imber
David Hanwell
Matthew Mattina
P. Whatmough
VLM
14
10
0
18 Nov 2019
ASV: Accelerated Stereo Vision System
Yu Feng
P. Whatmough
Yuhao Zhu
27
34
0
15 Nov 2019
MLPerf Inference Benchmark
Vijayarāghava Reḍḍī
C. Cheng
David Kanter
Pete H Mattson
Guenther Schmuelling
...
Bing Yu
George Y. Yuan
Aaron Zhong
P. Zhang
Yuchen Zhou
55
493
0
06 Nov 2019
Exploiting Parallelism Opportunities with Deep Learning Frameworks
Y. Wang
Carole-Jean Wu
Xiaodong Wang
K. Hazelwood
David Brooks
30
30
0
13 Aug 2019
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning
P. Whatmough
Chuteng Zhou
Patrick Hansen
S. Venkataramanaiah
Jae-sun Seo
Matthew Mattina
34
57
0
27 Feb 2019
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Jongsoo Park
Maxim Naumov
Protonu Basu
Summer Deng
Aravind Kalaiah
...
Lin Qiao
Vijay Rao
Nadav Rotem
S. Yoo
M. Smelyanskiy
FedML
GNN
BDL
39
187
0
24 Nov 2018
SCALE-Sim: Systolic CNN Accelerator Simulator
A. Samajdar
Yuhao Zhu
P. Whatmough
Matthew Mattina
Tushar Krishna
68
137
0
16 Oct 2018
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
Xuan S. Yang
Mingyu Gao
Qiaoyi Liu
Jeff Setter
Jing Pu
...
Kaidi Cao
Heonjae Ha
Priyanka Raina
Christos Kozyrakis
M. Horowitz
127
228
0
10 Sep 2018
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach Using MAESTRO
Hyoukjun Kwon
Prasanth Chatarasi
Michael Pellauer
A. Parashar
Vivek Sarkar
T. Krishna
29
10
0
04 May 2018
Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision
Yuhao Zhu
A. Samajdar
Matthew Mattina
P. Whatmough
124
87
0
29 Mar 2018
EVA
2
^2
2
: Exploiting Temporal Redundancy in Live Computer Vision
Mark Buckler
Philip Bedoukian
Suren Jayasuriya
Adrian Sampson
83
76
0
16 Mar 2018
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Tianqi Chen
T. Moreau
Ziheng Jiang
Lianmin Zheng
Eddie Q. Yan
...
Leyuan Wang
Yuwei Hu
Luis Ceze
Carlos Guestrin
Arvind Krishnamurthy
103
374
0
12 Feb 2018
CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices
Caiwen Ding
Siyu Liao
Yanzhi Wang
Zhe Li
Ning Liu
...
Yipeng Zhang
Jian Tang
Qinru Qiu
Xinyu Lin
Bo Yuan
GNN
43
260
0
29 Aug 2017
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
A. Parashar
Minsoo Rhu
Anurag Mukkara
A. Puglielli
Rangharajan Venkatesan
Brucek Khailany
J. Emer
S. Keckler
W. Dally
46
1,122
0
23 May 2017
In-Datacenter Performance Analysis of a Tensor Processing Unit
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
...
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
148
4,619
0
16 Apr 2017
TensorFlow: A system for large-scale machine learning
Martín Abadi
P. Barham
Jianmin Chen
Zhiwen Chen
Andy Davis
...
Vijay Vasudevan
Pete Warden
Martin Wicke
Yuan Yu
Xiaoqiang Zhang
GNN
AI4CE
281
18,300
0
27 May 2016
DLAU: A Scalable Deep Learning Accelerator Unit on FPGA
Chao Wang
Qi Yu
Lei Gong
Xi Li
Yuan Xie
Xuehai Zhou
AI4CE
11
302
0
23 May 2016
EIE: Efficient Inference Engine on Compressed Deep Neural Network
Song Han
Xingyu Liu
Huizi Mao
Jing Pu
A. Pedram
M. Horowitz
W. Dally
89
2,453
0
04 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.1K
192,638
0
10 Dec 2015
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
Tianqi Chen
Mu Li
Yutian Li
Min Lin
Naiyan Wang
Minjie Wang
Tianjun Xiao
Bing Xu
Chiyuan Zhang
Zheng Zhang
79
2,243
0
03 Dec 2015
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
Djork-Arné Clevert
Thomas Unterthiner
Sepp Hochreiter
184
5,502
0
23 Nov 2015
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia
Evan Shelhamer
Jeff Donahue
Sergey Karayev
Jonathan Long
Ross B. Girshick
S. Guadarrama
Trevor Darrell
VLM
BDL
3DV
150
14,703
0
20 Jun 2014
1