Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.04760
Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit
16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"In-Datacenter Performance Analysis of a Tensor Processing Unit"
50 / 1,167 papers shown
Title
Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration
Reena Elangovan
Shubham Jain
A. Raghunathan
19
7
0
25 Nov 2020
Bringing AI To Edge: From Deep Learning's Perspective
Di Liu
Hao Kong
Xiangzhong Luo
Weichen Liu
Ravi Subramaniam
116
124
0
25 Nov 2020
End-to-End Framework for Efficient Deep Learning Using Metasurfaces Optics
Carlos Mauricio Villegas Burgos
Tianqi Yang
Nick Vamivakas
Yuhao Zhu
28
0
0
23 Nov 2020
Distributed Deep Reinforcement Learning: An Overview
Mohammad Reza Samsami
Hossein Alimadad
OffRL
43
27
0
22 Nov 2020
FPGA deep learning acceleration based on convolutional neural network
Xiong Jun
21
2
0
17 Nov 2020
Customizing Trusted AI Accelerators for Efficient Privacy-Preserving Machine Learning
Peichen Xie
Xuanle Ren
Guangyu Sun
FedML
40
6
0
12 Nov 2020
DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator
Zihan Liu
Jingwen Leng
Quan Chen
Chao Li
Wenli Zheng
Li-Wei Li
Minyi Guo
31
8
0
11 Nov 2020
ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing
Cheng Tan
Chenhao Xie
Tong Geng
Andres Marquez
Antonino Tumeo
Kevin J. Barker
Ang Li
24
14
0
10 Nov 2020
Exploring the limits of Concurrency in ML Training on Google TPUs
Sameer Kumar
James Bradbury
C. Young
Yu Emma Wang
Anselm Levskaya
...
Tao Wang
Tayo Oguntebi
Yazhou Zu
Yuanzhong Xu
Andy Swing
BDL
AIMat
MoE
LRM
64
27
0
07 Nov 2020
Highly Available Data Parallel ML training on Mesh Networks
Sameer Kumar
N. Jouppi
MoE
AI4CE
45
11
0
06 Nov 2020
ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear Solvers
Linghao Song
Fan Chen
Xuehai Qian
Hai Li
Yiran Chen
62
6
0
06 Nov 2020
CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Kiwan Maeng
Shivam Bharuka
Isabel Gao
M. C. Jeffrey
V. Saraph
...
Caroline Trippel
Jiyan Yang
Michael G. Rabbat
Brandon Lucia
Carole-Jean Wu
OffRL
82
33
0
05 Nov 2020
InferBench: Understanding Deep Learning Inference Serving with an Automatic Benchmarking System
Huaizheng Zhang
Yizheng Huang
Yonggang Wen
Jianxiong Yin
K. Guan
57
3
0
04 Nov 2020
Cortex: A Compiler for Recursive Deep Learning Models
Pratik Fegade
Tianqi Chen
Phillip B. Gibbons
T. Mowry
VLM
62
28
0
02 Nov 2020
Photonics for artificial intelligence and neuromorphic computing
B. Shastri
A. Tait
T. F. D. Lima
W. Pernice
H. Bhaskaran
C. Wright
Paul R. Prucnal
84
1,223
0
30 Oct 2020
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
Arissa Wongpanich
Hieu H. Pham
J. Demmel
Mingxing Tan
Quoc V. Le
Yang You
Sameer Kumar
78
8
0
30 Oct 2020
Training Speech Recognition Models with Federated Learning: A Quality/Cost Framework
Dhruv Guliani
F. Beaufays
Giovanni Motta
FedML
63
85
0
29 Oct 2020
Systolic Computing on GPUs for Productive Performance
Hongbo Rong
Xiaochen Hao
Yun Liang
Lidong Xu
Hong Jiang
Pradeep Dubey
16
1
0
29 Oct 2020
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
128
86
0
27 Oct 2020
Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?
Jens Domke
Emil Vatai
Aleksandr Drozd
Peng Chen
Yosuke Oyama
...
Shweta Salaria
Daichi Mukunoki
Artur Podobas
Mohamed Wahib
Satoshi Matsuoka
58
25
0
27 Oct 2020
Stochastic Optimization with Laggard Data Pipelines
Naman Agarwal
Rohan Anil
Tomer Koren
Kunal Talwar
Cyril Zhang
35
12
0
26 Oct 2020
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Minjia Zhang
Yuxiong He
AI4CE
48
104
0
26 Oct 2020
LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Yujeong Choi
Yunseong Kim
Minsoo Rhu
63
68
0
25 Oct 2020
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Youngeun Kwon
Yunjae Lee
Minsoo Rhu
72
41
0
25 Oct 2020
R-TOD: Real-Time Object Detector with Minimized End-to-End Delay for Autonomous Driving
Won-Seok Jang
Hansaem Jeong
Kyungtae Kang
N. Dutt
Jong-Chan Kim
44
28
0
23 Oct 2020
Brain-Inspired Learning on Neuromorphic Substrates
Friedemann Zenke
Emre Neftci
130
90
0
22 Oct 2020
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters
Shaoshuai Shi
Xianhao Zhou
Shutao Song
Xingyao Wang
Zilin Zhu
...
Chenyang Guo
Bo Yang
Zhibo Chen
Yongjian Wu
Xiaowen Chu
GNN
81
55
0
20 Oct 2020
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search
Yunjiang Jiang
Yue Shang
Ziyang Liu
Hongwei Shen
Yun Xiao
Wei Xiong
Sulong Xu
Weipeng P. Yan
Di Jin
64
17
0
20 Oct 2020
Composite Enclaves: Towards Disaggregated Trusted Execution
Moritz Schneider
Aritra Dhar
Ivan Puddu
Kari Kostiainen
Srdjan Capkun
70
17
0
20 Oct 2020
Revisiting BFloat16 Training
Pedram Zamirai
Jian Zhang
Christopher R. Aberger
Christopher De Sa
FedML
MQ
38
20
0
13 Oct 2020
MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions
Wenqi Jiang
Zhen He
Shuai Zhang
Thomas B. Preußer
Kai Zeng
...
Tongxuan Liu
Yong Li
Jingren Zhou
Ce Zhang
Gustavo Alonso
60
7
0
12 Oct 2020
DESCNet: Developing Efficient Scratchpad Memories for Capsule Network Hardware
Alberto Marchisio
Vojtěch Mrázek
Muhammad Abdullah Hanif
Mohamed Bennai
41
12
0
12 Oct 2020
Cross-Stack Workload Characterization of Deep Recommendation Systems
Samuel Hsia
Udit Gupta
Mark Wilkening
Carole-Jean Wu
Gu-Yeon Wei
David Brooks
BDL
GNN
HAI
137
32
0
10 Oct 2020
A Tensor Compiler for Unified Machine Learning Prediction Serving
Supun Nakandala Karla Saur
Karla Saur
Gyeong-In Yu
Konstantinos Karanasos
Carlo Curino
Markus Weimer
Matteo Interlandi
98
53
0
09 Oct 2020
A Novel ANN Structure for Image Recognition
Shilpa Mayannavar
U. Wali
V. M. Aparanji
22
3
0
09 Oct 2020
Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition
Anshuman Tripathi
Jaeyoung Kim
Qian Zhang
Han Lu
Hasim Sak
71
43
0
07 Oct 2020
Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications
Matthew Khoury
Rumen Dangovski
L. Ou
Preslav Nakov
Yichen Shen
L. Jing
44
0
0
06 Oct 2020
Learned Hardware/Software Co-Design of Neural Accelerators
Zhan Shi
Chirag Sakhuja
Milad Hashemi
Kevin Swersky
Calvin Lin
76
15
0
05 Oct 2020
Local Label Point Correction for Edge Detection of Overlapping Cervical Cells
Jiawei Liu
Huijie Fan
Qiang Wang
Wentao Li
Yandong Tang
Danbo Wang
Mingyi Zhou
Li Chen
61
11
0
05 Oct 2020
Neighbourhood Distillation: On the benefits of non end-to-end distillation
Laetitia Shao
Max Moroz
Elad Eban
Yair Movshovitz-Attias
ODL
49
0
0
02 Oct 2020
EigenGame: PCA as a Nash Equilibrium
I. Gemp
Brian McWilliams
Claire Vernade
T. Graepel
114
48
0
01 Oct 2020
A compute-bound formulation of Galerkin model reduction for linear time-invariant dynamical systems
F. Rizzi
E. Parish
P. Blonigan
John Tencer
32
1
0
24 Sep 2020
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training
Dingqing Yang
Amin Ghasemazar
X. Ren
Maximilian Golub
G. Lemieux
Mieszko Lis
71
49
0
23 Sep 2020
E-BATCH: Energy-Efficient and High-Throughput RNN Batching
Franyell Silfa
J. Arnau
Antonio González
40
12
0
22 Sep 2020
DeepDyve: Dynamic Verification for Deep Neural Networks
Yu Li
Min Li
Bo Luo
Ye Tian
Qiang Xu
AAML
89
31
0
21 Sep 2020
GrateTile: Efficient Sparse Tensor Tiling for CNN Processing
Yu-Sheng Lin
Hung Chang Lu
Yang-Bin Tsao
Yi-Min Chih
Wei-Chao Chen
Shao-Yi Chien
13
5
0
18 Sep 2020
Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics
Li Li
Stephan Hoyer
Ryan Pederson
Ruoxi Sun
E. D. Cubuk
Patrick F. Riley
K. Burke
AI4CE
94
124
0
17 Sep 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
197
80
0
17 Sep 2020
Large-Scale Intelligent Microservices
Mark Hamilton
Nick Gonsalves
Christina Lee
Anand Raman
Brendan Walsh
...
Dalitso Banda
Lucy Zhang
Mei Gao
Lei Zhang
William T. Freeman
SyDa
AI4TS
35
5
0
17 Sep 2020
The Hardware Lottery
Sara Hooker
96
213
0
14 Sep 2020
Previous
1
2
3
...
12
13
14
...
22
23
24
Next