Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.04760
Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit
16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"In-Datacenter Performance Analysis of a Tensor Processing Unit"
50 / 1,167 papers shown
Title
Speck: A Smart event-based Vision Sensor with a low latency 327K Neuron Convolutional Neuronal Network Processing Pipeline
Ole Richter
Y. Xing
M. D. Marchi
Carsten Nielsen
M. Katsimpris
...
SynSense
Bio-Inspired Circuits
Sadique Sheik
T. Demirci
Groningen Cognitive Systems
94
61
0
13 Apr 2023
Training Large Language Models Efficiently with Sparsity and Dataflow
V. Srinivasan
Darshan Gandhi
Urmish Thakker
R. Prabhakar
MoE
69
6
0
11 Apr 2023
SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration
I. Miro-Panadès
Benoît Tain
J. Christmann
David Coriat
R. Lemaire
...
Jean-Marc Philippe
Y. Thonnart
A. Valentian
Frédéric Heitzmann
F. Clermidy
46
15
0
11 Apr 2023
Mixed-Precision Random Projection for RandNLA on Tensor Cores
Hiroyuki Ootomo
Rio Yokota
44
3
0
10 Apr 2023
Arithmetic Intensity Balancing Convolution for Hardware-aware Efficient Block Design
Shinkook Choi
Junkyeong Choi
26
1
0
08 Apr 2023
Tensor Slicing and Optimization for Multicore NPUs
R. Sousa
M. Pereira
Yongin Kwon
Taeho Kim
Namsoon Jung
Chang Soo Kim
Michael Frank
Guido Araujo
86
6
0
06 Apr 2023
A differentiable programming framework for spin models
T. S. Farias
V. V. Schultz
José C. M. Mombach
Jonas Maziero
55
1
0
04 Apr 2023
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
N. Jouppi
George Kurian
Sheng Li
Peter C. Ma
R. Nagarajan
...
Brian Towles
C. Young
Xiaoping Zhou
Zongwei Zhou
David A. Patterson
BDL
VLM
169
371
0
04 Apr 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Rami Botros
Anmol Gulati
Tara N. Sainath
K. Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
MQ
80
3
0
31 Mar 2023
D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs
Aditya Dhakal
Sameer G. Kulkarni
K. Ramakrishnan
30
4
0
31 Mar 2023
PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration
Richard Petri
Grace Li Zhang
Yiran Chen
Ulf Schlichtmann
Bing Li
29
6
0
24 Mar 2023
Pre-NeRF 360: Enriching Unbounded Appearances for Neural Radiance Fields
Ahmad AlMughrabi
Umair Haroon
Ricardo Marques
Petia Radeva
66
6
0
21 Mar 2023
Economical Quaternion Extraction from a Human Skeletal Pose Estimate using 2-D Cameras
S. Radhakrishna
A. Balasubramanyam
3DH
60
1
0
15 Mar 2023
DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators
Mahdi Taheri
M. Riazati
Mohammad Hasan Ahmadilivani
M. Jenihhin
Masoud Daneshtalab
J. Raik
Mikael Sjödin
B. Lisper
76
20
0
14 Mar 2023
X-Former: In-Memory Acceleration of Transformers
S. Sridharan
Jacob R. Stevens
Kaushik Roy
A. Raghunathan
GNN
53
38
0
13 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
107
89
0
06 Mar 2023
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
82
172
0
03 Mar 2023
HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture
Yi-Chien Lin
Viktor Prasanna
GNN
67
7
0
01 Mar 2023
Auxiliary MCMC and particle Gibbs samplers for parallelisable inference in latent dynamical systems
Adrien Corenflos
Simo Särkkä
77
0
0
01 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
163
106
0
27 Feb 2023
MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation
Samuel Hsia
Udit Gupta
Bilge Acun
Newsha Ardalani
Pan Zhong
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
108
17
0
21 Feb 2023
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Geonhwa Jeong
S. Damani
Abhimanyu Bambhaniya
Eric Qin
C. Hughes
S. Subramoney
Hyesoon Kim
T. Krishna
MoE
84
26
0
17 Feb 2023
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
Minghao Li
Ran Ben-Basat
S. Vargaftik
Chon-In Lao
Ke Xu
Michael Mitzenmacher
Minlan Yu Harvard University
94
19
0
16 Feb 2023
Toward matrix multiplication for deep learning inference on the Xilinx Versal
Jie Lei
J. Flich
Enrique S. Quintana-Ortí
29
4
0
15 Feb 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle
Vanessa Mehlin
Sigurd Schacht
Carsten Lanquillon
HAI
MedIm
127
20
0
05 Feb 2023
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
128
49
0
02 Feb 2023
Bit-balance: Model-Hardware Co-design for Accelerating NNs by Exploiting Bit-level Sparsity
Wenhao Sun
Zhiwei Zou
Deng Liu
Wendi Sun
Song Chen
Yi Kang
MQ
21
7
0
01 Feb 2023
A Green(er) World for A.I
Dan Zhao
Nathan C. Frey
Joseph McDonald
Matthew Hubbell
David Bestor
Michael Jones
Andrew Prout
V. Gadepally
S. Samsi
69
6
0
27 Jan 2023
PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices
Yuji Chai
Devashree Tripathy
Chu Zhou
Dibakar Gope
Igor Fedorov
Ramon Matas
David Brooks
Gu-Yeon Wei
P. Whatmough
GNN
78
5
0
26 Jan 2023
SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network Accelerators
Mingi Yoo
Jaeyong Song
Jounghoo Lee
Namhyung Kim
Youngsok Kim
Jinho Lee
GNN
89
22
0
25 Jan 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
74
28
0
24 Jan 2023
Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators
Min-hee Yoo
Jaeyong Song
Hyeyoon Lee
Jounghoo Lee
Namhyung Kim
Youngsok Kim
Jinho Lee
GNN
79
5
0
24 Jan 2023
Enabling Hard Constraints in Differentiable Neural Network and Accelerator Co-Exploration
Deokki Hong
Kanghyun Choi
Hyeyoon Lee
Joonsang Yu
Noseong Park
Youngsok Kim
Jinho Lee
46
3
0
23 Jan 2023
Analog, In-memory Compute Architectures for Artificial Intelligence
Patrick Bowen
G. Regev
Nir Regev
Bruno U. Pedroni
Edward Hanson
Yiran Chen
25
3
0
13 Jan 2023
Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics
George Michelogiannakis
Yehia Arafa
B. Cook
Liang Yuan Dai
Abdel-Hameed A. Badawy
Madeleine Glick
Yuyang Wang
Keren Bergman
J. Shalf
57
9
0
09 Jan 2023
FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation Models
Geet Sethi
Pallab Bhattacharya
Dhruv Choudhary
Carole-Jean Wu
Christos Kozyrakis
79
5
0
08 Jan 2023
A Theory of I/O-Efficient Sparse Neural Network Inference
Niels Gleinig
Tal Ben-Nun
Torsten Hoefler
59
0
0
03 Jan 2023
Accelerating CNN inference on long vector architectures via co-design
Sonia Rani Gupta
Nikela Papadopoulou
Miquel Pericàs
3DV
78
4
0
22 Dec 2022
Annotated History of Modern AI and Deep Learning
Juergen Schmidhuber
MLAU
AI4TS
AI4CE
65
25
0
21 Dec 2022
Sophisticated deep learning with on-chip optical diffractive tensor processing
Yuyao Huang
Tingzhao Fu
Honghao Huang
Sigang Yang
Hong-wei Chen
BDL
20
14
0
20 Dec 2022
AnyTOD: A Programmable Task-Oriented Dialog System
Jeffrey Zhao
Yuan Cao
Raghav Gupta
Harrison Lee
Abhinav Rastogi
Mingqiu Wang
H. Soltau
Izhak Shafran
Yonghui Wu
VLM
92
11
0
20 Dec 2022
Containerisation for High Performance Computing Systems: Survey and Prospects
Naweiluo Zhou
Huan Zhou
Dennis Hoppe
65
27
0
16 Dec 2022
Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks
Mingyu Liang
Wenyin Fu
Louis Feng
Zhongyi Lin
P. Panakanti
Shengbao Zheng
Srinivas Sridharan
Christina Delimitrou
52
12
0
16 Dec 2022
Analytical Engines With Context-Rich Processing: Towards Efficient Next-Generation Analytics
Viktor Sanca
Anastasia Ailamaki
116
4
0
14 Dec 2022
DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling
L. Mei
Koen Goetschalckx
Arne Symons
Marian Verhelst
183
31
0
10 Dec 2022
Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework
Federico Nicolás Peccia
Oliver Bringmann
52
5
0
06 Dec 2022
DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation
Liu Ke
Xuan Zhang
Benjamin C. Lee
G. E. Suh
Hsien-Hsin S. Lee
71
8
0
02 Dec 2022
On-device Training: A First Overview on Existing Systems
Shuai Zhu
Thiemo Voigt
Jeonggil Ko
Fatemeh Rahimian
138
17
0
01 Dec 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
81
109
0
29 Nov 2022
Edge Video Analytics: A Survey on Applications, Systems and Enabling Techniques
Renjie Xu
S. Razavi
Rong Zheng
112
20
0
28 Nov 2022
Previous
1
2
3
4
5
6
...
22
23
24
Next