Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.04760
Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit
16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
Re-assign community
ArXiv
PDF
HTML
Papers citing
"In-Datacenter Performance Analysis of a Tensor Processing Unit"
50 / 1,165 papers shown
Title
Accelerating convolutional neural network by exploiting sparsity on GPUs
Weizhi Xu
Yintai Sun
Shengyu Fan
Hui Yu
Xin Fu
33
7
0
22 Sep 2019
Scale MLPerf-0.6 models on Google TPU-v3 Pods
Sameer Kumar
Victor Bitorff
Dehao Chen
Chi-Heng Chou
Blake A. Hechtman
...
Peter Mattson
Shibo Wang
Tao Wang
Yuanzhong Xu
Zongwei Zhou
10
39
0
21 Sep 2019
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems
Xiaofan Zhang
Haoming Lu
Cong Hao
Jiachen Li
Bowen Cheng
...
Jinjun Xiong
Thomas Huang
Humphrey Shi
Wen-mei W. Hwu
Deming Chen
39
92
0
20 Sep 2019
A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks
Xiaoyu Yu
Yuwei Wang
Jie Miao
Ephrem Wu
Heng Zhang
Yu Meng
Bo Zhang
Biao Min
Dewei Chen
Jianlin Gao
33
21
0
17 Sep 2019
High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS
Shihui Yin
Xiaoyu Sun
Shimeng Yu
Jae-sun Seo
MQ
12
104
0
16 Sep 2019
Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training
Yuxin Wang
Qiang-qiang Wang
Shaoshuai Shi
Xin He
Zhenheng Tang
Kaiyong Zhao
Xiaowen Chu
25
3
0
15 Sep 2019
Heterogeneous Dataflow Accelerators for Multi-DNN Workloads
Hyoukjun Kwon
Liangzhen Lai
Michael Pellauer
T. Krishna
Yu-Hsin Chen
Vikas Chandra
19
16
0
13 Sep 2019
DASNet: Dynamic Activation Sparsity for Neural Network Efficiency Improvement
Qing Yang
Jiachen Mao
Zuoguan Wang
H. Li
21
15
0
13 Sep 2019
Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model
Anjuli Kannan
A. Datta
Tara N. Sainath
Eugene Weinstein
Bhuvana Ramabhadran
Yonghui Wu
Ankur Bapna
Zhehuai Chen
Seungjin Lee
AuLLM
26
174
0
11 Sep 2019
Unrolling Ternary Neural Networks
Stephen Tridgell
M. Kumm
M. Hardieck
David Boland
Duncan J. M. Moss
P. Zipf
Philip H. W. Leong
29
27
0
09 Sep 2019
PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Yujeong Choi
Minsoo Rhu
6
128
0
06 Sep 2019
ModiPick: SLA-aware Accuracy Optimization For Mobile Deep Inference
Samuel S. Ogden
Tian Guo
14
3
0
04 Sep 2019
Sparse Deep Neural Network Graph Challenge
J. Kepner
Simon Alford
V. Gadepally
Michael Jones
Lauren Milechin
Ryan A. Robinett
S. Samsi
GNN
19
49
0
02 Sep 2019
Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset
Bill Byrne
Karthikeyan K
Chinnadhurai Sankar
Arvind Neelakantan
Daniel Duckworth
Semih Yavuz
Ben Goodrich
Amit Dubey
A. Cedilnik
Kyu-Young Kim
18
215
0
01 Sep 2019
TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir
S. Samsi
Michael Houle
14
4
0
29 Aug 2019
High Performance Scalable FPGA Accelerator for Deep Neural Networks
Sudarshan Srinivasan
Pradeep Janedula
Saurabh Dhoble
Sasikanth Avancha
Dipankar Das
Naveen Mellempudi
Bharat Daga
M. Langhammer
Gregg Baeckler
Bharat Kaul
11
3
0
29 Aug 2019
Extending TensorFlow's Semantics with Pipelined Execution
S. Whitlock
James R. Larus
Edouard Bugnion
9
1
0
25 Aug 2019
A Computational Model for Tensor Core Units
Rezaul Chowdhury
Francesco Silvestri
Flavio Vella
6
15
0
19 Aug 2019
Automatic Compiler Based FPGA Accelerator for CNN Training
S. Venkataramanaiah
Yufei Ma
Shihui Yin
Eriko Nurvitadhi
A. Dasu
Yu Cao
Jae-sun Seo
32
38
0
15 Aug 2019
Accelerated CNN Training Through Gradient Approximation
Ziheng Wang
Sree Harsha Nelaturu
179
5
0
15 Aug 2019
AIBench: An Industry Standard Internet Service AI Benchmark Suite
Wanling Gao
Fei Tang
Lei Wang
Jianfeng Zhan
Chunxin Lan
...
Yatao Li
Junchao Shao
Zhenyu Wang
Xiaoyu Wang
Hainan Ye
33
45
0
13 Aug 2019
TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning
Youngeun Kwon
Yunjae Lee
Minsoo Rhu
21
208
0
08 Aug 2019
3D-aCortex: An Ultra-Compact Energy-Efficient Neurocomputing Platform Based on Commercial 3D-NAND Flash Memories
Mohammad Bavandpour
Shubham Sahay
M. Mahmoodi
D. Strukov
24
29
0
07 Aug 2019
Tuning Algorithms and Generators for Efficient Edge Inference
R. Naous
Lazar Supic
Yoonhwan Kang
Ranko Seradejovic
Anish Singhani
Vladimir M. Stojanović
14
2
0
31 Jul 2019
HPC AI500: A Benchmark Suite for HPC AI Systems
Zihan Jiang
Wanling Gao
Lei Wang
Xingwang Xiong
Yuchen Zhang
...
Yunquan Zhang
Shengzhong Feng
KenLi Li
Weijia Xu
Jianfeng Zhan
ELM
24
40
0
27 Jul 2019
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Y. Wang
Gu-Yeon Wei
David Brooks
ELM
VLM
36
274
0
24 Jul 2019
A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection
Yue Liu
Zeyi Wen
Zhaomin Wu
Sixu Hu
Naibo Wang
Yuan N. Li
Xu Liu
Bingsheng He
FedML
37
975
0
23 Jul 2019
Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference
Weiwen Jiang
E. Sha
Xinyi Zhang
Lei Yang
Qingfeng Zhuge
Yiyu Shi
Jiaxi Hu
11
75
0
21 Jul 2019
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
Xiaofei Wang
Yiwen Han
Victor C. M. Leung
Dusit Niyato
Xueqiang Yan
Xu Chen
24
978
0
19 Jul 2019
A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels
Peng Chen
Mohamed Wahib
Shiníchiro Takizawa
Ryousei Takano
Satoshi Matsuoka
11
22
0
14 Jul 2019
A semi-holographic hyperdimensional representation system for hardware-friendly cognitive computing
Alexandrou Serb
I. Kobyzev
Jiaqi Wang
T. Prodromakis
9
3
0
12 Jul 2019
VarGNet: Variable Group Convolutional Neural Network for Efficient Embedded Computing
Qian Zhang
Jianjun Li
Meng Yao
Liangchen Song
Helong Zhou
Zhichao Li
Wenming Meng
Xuezhi Zhang
Guoli Wang
26
22
0
12 Jul 2019
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
N. Arivazhagan
Ankur Bapna
Orhan Firat
Dmitry Lepikhin
Melvin Johnson
...
George F. Foster
Colin Cherry
Wolfgang Macherey
Zhehuai Chen
Yonghui Wu
37
423
0
11 Jul 2019
Making AI Forget You: Data Deletion in Machine Learning
Antonio A. Ginart
M. Guan
Gregory Valiant
James Zou
MU
13
462
0
11 Jul 2019
Joint Speech Recognition and Speaker Diarization via Sequence Transduction
Laurent El Shafey
H. Soltau
Izhak Shafran
39
99
0
09 Jul 2019
Point-Voxel CNN for Efficient 3D Deep Learning
Zhijian Liu
Haotian Tang
Chengyue Wu
Song Han
3DPC
67
663
0
08 Jul 2019
Speech bandwidth extension with WaveNet
Archit Gupta
Brendan Shillingford
Yannis Assael
Thomas C. Walters
27
28
0
05 Jul 2019
Single-Path Mobile AutoML: Efficient ConvNet Design and NAS Hyperparameter Optimization
Dimitrios Stamoulis
Ruizhou Ding
Di Wang
Dimitrios Lymberopoulos
B. Priyantha
Jie Liu
Diana Marculescu
12
33
0
01 Jul 2019
On improving deep learning generalization with adaptive sparse connectivity
Shiwei Liu
Decebal Constantin Mocanu
Mykola Pechenizkiy
ODL
20
7
0
27 Jun 2019
Learning Data Augmentation Strategies for Object Detection
Barret Zoph
E. D. Cubuk
Golnaz Ghiasi
Nayeon Lee
Jonathon Shlens
Quoc V. Le
39
523
0
26 Jun 2019
ALTIS: Modernizing GPGPU Benchmarking
Bodun Hu
Christopher J. Rossbach
12
3
0
25 Jun 2019
The Coming Age of Pervasive Data Processing
Jan S. Rellermeyer
Sobhan Omranian Khorasani
D. Graur
Apourva Parthasarathy
20
5
0
21 Jun 2019
Joint Regularization on Activations and Weights for Efficient Neural Network Pruning
Q. Yang
W. Wen
Zuoguan Wang
H. Li
13
1
0
19 Jun 2019
High-Performance Deep Learning via a Single Building Block
E. Georganas
K. Banerjee
Dhiraj D. Kalamkar
Sasikanth Avancha
Anand Venkat
Michael J. Anderson
G. Henry
Hans Pabst
A. Heinecke
26
12
0
15 Jun 2019
Stand-Alone Self-Attention in Vision Models
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
VLM
SLR
ViT
38
1,199
0
13 Jun 2019
Parameterized Structured Pruning for Deep Neural Networks
Günther Schindler
Wolfgang Roth
Franz Pernkopf
Holger Froening
26
6
0
12 Jun 2019
PABO: Pseudo Agent-Based Multi-Objective Bayesian Hyperparameter Optimization for Efficient Neural Accelerator Design
Maryam Parsa
Aayush Ankit
A. Ziabari
Kaushik Roy
19
28
0
11 Jun 2019
ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining
Vojtěch Mrázek
Z. Vašíček
Lukás Sekanina
Muhammad Abdullah Hanif
Mohamed Bennai
19
92
0
11 Jun 2019
Parallel Scheduled Sampling
Daniel Duckworth
Arvind Neelakantan
Ben Goodrich
Lukasz Kaiser
Samy Bengio
33
23
0
11 Jun 2019
Meta-Learning Neural Bloom Filters
Jack W. Rae
Sergey Bartunov
Timothy Lillicrap
11
32
0
10 Jun 2019
Previous
1
2
3
...
17
18
19
...
22
23
24
Next