Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.04760
Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit
16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
Re-assign community
ArXiv
PDF
HTML
Papers citing
"In-Datacenter Performance Analysis of a Tensor Processing Unit"
50 / 1,165 papers shown
Title
Distributed Deep Reinforcement Learning: An Overview
Mohammad Reza Samsami
Hossein Alimadad
OffRL
14
27
0
22 Nov 2020
FPGA deep learning acceleration based on convolutional neural network
Xiong Jun
14
2
0
17 Nov 2020
Customizing Trusted AI Accelerators for Efficient Privacy-Preserving Machine Learning
Peichen Xie
Xuanle Ren
Guangyu Sun
FedML
14
6
0
12 Nov 2020
DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator
Zihan Liu
Jingwen Leng
Quan Chen
Chao Li
Wenli Zheng
Li-Wei Li
Minyi Guo
9
8
0
11 Nov 2020
ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing
Cheng Tan
Chenhao Xie
Tong Geng
Andres Marquez
Antonino Tumeo
Kevin J. Barker
Ang Li
15
13
0
10 Nov 2020
Exploring the limits of Concurrency in ML Training on Google TPUs
Sameer Kumar
James Bradbury
C. Young
Yu Emma Wang
Anselm Levskaya
...
Tao Wang
Tayo Oguntebi
Yazhou Zu
Yuanzhong Xu
Andy Swing
BDL
AIMat
MoE
LRM
25
27
0
07 Nov 2020
Highly Available Data Parallel ML training on Mesh Networks
Sameer Kumar
N. Jouppi
MoE
AI4CE
6
9
0
06 Nov 2020
ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear Solvers
Linghao Song
Fan Chen
Xuehai Qian
Hai Li
Yiran Chen
6
4
0
06 Nov 2020
CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Kiwan Maeng
Shivam Bharuka
Isabel Gao
M. C. Jeffrey
V. Saraph
...
Caroline Trippel
Jiyan Yang
Michael G. Rabbat
Brandon Lucia
Carole-Jean Wu
OffRL
37
31
0
05 Nov 2020
InferBench: Understanding Deep Learning Inference Serving with an Automatic Benchmarking System
Huaizheng Zhang
Yizheng Huang
Yonggang Wen
Jianxiong Yin
K. Guan
22
3
0
04 Nov 2020
Cortex: A Compiler for Recursive Deep Learning Models
Pratik Fegade
Tianqi Chen
Phillip B. Gibbons
T. Mowry
VLM
16
28
0
02 Nov 2020
Photonics for artificial intelligence and neuromorphic computing
B. Shastri
A. Tait
T. F. D. Lima
W. Pernice
H. Bhaskaran
C. Wright
Paul R. Prucnal
25
1,178
0
30 Oct 2020
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
Arissa Wongpanich
Hieu H. Pham
J. Demmel
Mingxing Tan
Quoc V. Le
Yang You
Sameer Kumar
19
8
0
30 Oct 2020
Training Speech Recognition Models with Federated Learning: A Quality/Cost Framework
Dhruv Guliani
F. Beaufays
Giovanni Motta
FedML
13
80
0
29 Oct 2020
Systolic Computing on GPUs for Productive Performance
Hongbo Rong
Xiaochen Hao
Yun Liang
Lidong Xu
Hong Jiang
Pradeep Dubey
14
1
0
29 Oct 2020
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
8
85
0
27 Oct 2020
Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?
Jens Domke
Emil Vatai
Aleksandr Drozd
Peng Chen
Yosuke Oyama
...
Shweta Salaria
Daichi Mukunoki
Artur Podobas
Mohamed Wahib
Satoshi Matsuoka
40
24
0
27 Oct 2020
Stochastic Optimization with Laggard Data Pipelines
Naman Agarwal
Rohan Anil
Tomer Koren
Kunal Talwar
Cyril Zhang
21
12
0
26 Oct 2020
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Minjia Zhang
Yuxiong He
AI4CE
13
100
0
26 Oct 2020
LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Yujeong Choi
Yunseong Kim
Minsoo Rhu
24
66
0
25 Oct 2020
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Youngeun Kwon
Yunjae Lee
Minsoo Rhu
27
40
0
25 Oct 2020
R-TOD: Real-Time Object Detector with Minimized End-to-End Delay for Autonomous Driving
Won-Seok Jang
Hansaem Jeong
Kyungtae Kang
N. Dutt
Jong-Chan Kim
13
24
0
23 Oct 2020
Brain-Inspired Learning on Neuromorphic Substrates
Friedemann Zenke
Emre Neftci
38
89
0
22 Oct 2020
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters
Shaoshuai Shi
Xianhao Zhou
Shutao Song
Xingyao Wang
Zilin Zhu
...
Chenyang Guo
Bo Yang
Zhibo Chen
Yongjian Wu
Xiaowen Chu
GNN
23
55
0
20 Oct 2020
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search
Yunjiang Jiang
Yue Shang
Ziyang Liu
Hongwei Shen
Yun Xiao
Wei Xiong
Sulong Xu
Weipeng P. Yan
Di Jin
34
17
0
20 Oct 2020
Composite Enclaves: Towards Disaggregated Trusted Execution
Moritz Schneider
Aritra Dhar
Ivan Puddu
Kari Kostiainen
Srdjan Capkun
18
16
0
20 Oct 2020
Revisiting BFloat16 Training
Pedram Zamirai
Jian Zhang
Christopher R. Aberger
Christopher De Sa
FedML
MQ
29
20
0
13 Oct 2020
MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions
Wenqi Jiang
Zhen He
Shuai Zhang
Thomas B. Preußer
Kai Zeng
...
Tongxuan Liu
Yong Li
Jingren Zhou
Ce Zhang
Gustavo Alonso
42
7
0
12 Oct 2020
DESCNet: Developing Efficient Scratchpad Memories for Capsule Network Hardware
Alberto Marchisio
Vojtěch Mrázek
Muhammad Abdullah Hanif
Mohamed Bennai
11
12
0
12 Oct 2020
Cross-Stack Workload Characterization of Deep Recommendation Systems
Samuel Hsia
Udit Gupta
Mark Wilkening
Carole-Jean Wu
Gu-Yeon Wei
David Brooks
BDL
GNN
HAI
28
32
0
10 Oct 2020
A Tensor Compiler for Unified Machine Learning Prediction Serving
Supun Nakandala Karla Saur
Karla Saur
Gyeong-In Yu
Konstantinos Karanasos
Carlo Curino
Markus Weimer
Matteo Interlandi
37
53
0
09 Oct 2020
A Novel ANN Structure for Image Recognition
Shilpa Mayannavar
U. Wali
V. M. Aparanji
8
3
0
09 Oct 2020
Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition
Anshuman Tripathi
Jaeyoung Kim
Qian Zhang
Han Lu
Hasim Sak
9
42
0
07 Oct 2020
Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications
Matthew Khoury
Rumen Dangovski
L. Ou
Preslav Nakov
Yichen Shen
L. Jing
31
0
0
06 Oct 2020
Learned Hardware/Software Co-Design of Neural Accelerators
Zhan Shi
Chirag Sakhuja
Milad Hashemi
Kevin Swersky
Calvin Lin
19
15
0
05 Oct 2020
Local Label Point Correction for Edge Detection of Overlapping Cervical Cells
Jiawei Liu
Huijie Fan
Qiang Wang
Wentao Li
Yandong Tang
Danbo Wang
Mingyi Zhou
Li Chen
18
9
0
05 Oct 2020
Neighbourhood Distillation: On the benefits of non end-to-end distillation
Laetitia Shao
Max Moroz
Elad Eban
Yair Movshovitz-Attias
ODL
18
0
0
02 Oct 2020
EigenGame: PCA as a Nash Equilibrium
I. Gemp
Brian McWilliams
Claire Vernade
T. Graepel
32
46
0
01 Oct 2020
A compute-bound formulation of Galerkin model reduction for linear time-invariant dynamical systems
F. Rizzi
E. Parish
P. Blonigan
John Tencer
9
1
0
24 Sep 2020
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training
Dingqing Yang
Amin Ghasemazar
X. Ren
Maximilian Golub
G. Lemieux
Mieszko Lis
22
48
0
23 Sep 2020
E-BATCH: Energy-Efficient and High-Throughput RNN Batching
Franyell Silfa
J. Arnau
Antonio González
30
11
0
22 Sep 2020
DeepDyve: Dynamic Verification for Deep Neural Networks
Yu Li
Min Li
Bo Luo
Ye Tian
Qiang Xu
AAML
16
30
0
21 Sep 2020
GrateTile: Efficient Sparse Tensor Tiling for CNN Processing
Yu-Sheng Lin
Hung Chang Lu
Yang-Bin Tsao
Yi-Min Chih
Wei-Chao Chen
Shao-Yi Chien
4
5
0
18 Sep 2020
Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics
Li Li
Stephan Hoyer
Ryan Pederson
Ruoxi Sun
E. D. Cubuk
Patrick F. Riley
K. Burke
AI4CE
37
120
0
17 Sep 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
44
79
0
17 Sep 2020
Large-Scale Intelligent Microservices
Mark Hamilton
Nick Gonsalves
Christina Lee
Anand Raman
Brendan Walsh
...
Dalitso Banda
Lucy Zhang
Mei Gao
Lei Zhang
William T. Freeman
SyDa
AI4TS
11
5
0
17 Sep 2020
The Hardware Lottery
Sara Hooker
29
204
0
14 Sep 2020
DANCE: Differentiable Accelerator/Network Co-Exploration
Kanghyun Choi
Deokki Hong
Hojae Yoon
Joonsang Yu
Youngsok Kim
Jinho Lee
25
45
0
14 Sep 2020
Time-Based Roofline for Deep Learning Performance Analysis
Yunsong Wang
Charlene Yang
S. Farrell
Yan Zhang
Thorsten Kurth
Samuel Williams
27
17
0
09 Sep 2020
not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution
Seung-Jun Han
Akash Srivastava
C. Hurwitz
P. Sattigeri
David D. Cox
14
8
0
09 Sep 2020
Previous
1
2
3
...
12
13
14
...
22
23
24
Next