ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04760
  4. Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
ArXiv (abs)PDFHTML

Papers citing "In-Datacenter Performance Analysis of a Tensor Processing Unit"

50 / 1,167 papers shown
Title
Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical
  Study on Accelerating Google Edge Models
Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models
Amirali Boroumand
Saugata Ghose
Berkin Akin
Ravi Narayanaswami
Geraldo F. Oliveira
Xiaoyu Ma
Eric Shiu
O. Mutlu
71
29
0
01 Mar 2021
Accelerating Recommendation System Training by Leveraging Popular
  Choices
Accelerating Recommendation System Training by Leveraging Popular Choices
Muhammad Adnan
Yassaman Ebrahimzadeh Maboud
Divyat Mahajan
Prashant J. Nair
86
60
0
01 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
107
47
0
28 Feb 2021
Swift for TensorFlow: A portable, flexible platform for deep learning
Swift for TensorFlow: A portable, flexible platform for deep learning
Brennan Saeta
Denys Shabalin
M. Rasi
Brad Larson
Xihui Wu
...
Saleem Abdulrasool
A. Efremov
Dave Abrahams
Chris Lattner
Richard Wei
HAI
69
11
0
26 Feb 2021
LogME: Practical Assessment of Pre-trained Models for Transfer Learning
LogME: Practical Assessment of Pre-trained Models for Transfer Learning
Kaichao You
Yong Liu
Jianmin Wang
Mingsheng Long
99
189
0
22 Feb 2021
An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks
An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks
K. Seshadri
Berkin Akin
James Laudon
Ravi Narayanaswami
Amir Yazdanbakhsh
105
121
0
20 Feb 2021
Control Variate Approximation for DNN Accelerators
Control Variate Approximation for DNN Accelerators
Georgios Zervakis
Ourania Spantidi
Iraklis Anagnostopoulos
H. Amrouch
J. Henkel
BDL
55
24
0
18 Feb 2021
Combinatorial optimization and reasoning with graph neural networks
Combinatorial optimization and reasoning with graph neural networks
Quentin Cappart
Didier Chételat
Elias Boutros Khalil
Andrea Lodi
Christopher Morris
Petar Velickovic
AI4CE
130
361
0
18 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
359
181
0
17 Feb 2021
A Survey of Machine Learning for Computer Architecture and Systems
A Survey of Machine Learning for Computer Architecture and Systems
Nan Wu
Yuan Xie
AI4TSAI4CE
108
153
0
16 Feb 2021
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient
  Descent
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent
Heesu Kim
Hanmin Park
Taehyun Kim
Kwanheum Cho
Eojin Lee
Soojung Ryu
Hyuk-Jae Lee
Kiyoung Choi
Jinho Lee
66
37
0
15 Feb 2021
CrossLight: A Cross-Layer Optimized Silicon Photonic Neural Network
  Accelerator
CrossLight: A Cross-Layer Optimized Silicon Photonic Neural Network Accelerator
Febin P. Sunny
Asif Mirza
Mahdi Nikdast
S. Pasricha
57
72
0
13 Feb 2021
Discovery of Options via Meta-Learned Subgoals
Discovery of Options via Meta-Learned Subgoals
Vivek Veeriah
Tom Zahavy
Matteo Hessel
Zhongwen Xu
Junhyuk Oh
Iurii Kemaev
H. V. Hasselt
David Silver
Satinder Singh
82
33
0
12 Feb 2021
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers
  Suffice Across Batch Sizes
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Zachary Nado
Justin M. Gilmer
Christopher J. Shallue
Rohan Anil
George E. Dahl
ODL
100
27
0
12 Feb 2021
Temporal Parallelization of Inference in Hidden Markov Models
Temporal Parallelization of Inference in Hidden Markov Models
S. S. Hassan
Simo Särkkä
Á. F. García-Fernández
TPM
39
12
0
10 Feb 2021
Searching for Fast Model Families on Datacenter Accelerators
Searching for Fast Model Families on Datacenter Accelerators
Sheng Li
Mingxing Tan
Ruoming Pang
Andrew Li
Liqun Cheng
Quoc V. Le
N. Jouppi
90
34
0
10 Feb 2021
Colorization Transformer
Colorization Transformer
Manoj Kumar
Dirk Weissenborn
Nal Kalchbrenner
ViT
346
160
0
08 Feb 2021
Horizontally Fused Training Array: An Effective Hardware Utilization
  Squeezer for Training Novel Deep Learning Models
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models
Shang Wang
Peiming Yang
Yuxuan Zheng
Xuelong Li
Gennady Pekhimenko
82
22
0
03 Feb 2021
Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video
  Analytics Pipelines
Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines
Francisco Romero
Mark Zhao
N. Yadwadkar
Christos Kozyrakis
91
109
0
03 Feb 2021
Truly Sparse Neural Networks at Scale
Truly Sparse Neural Networks at Scale
Selima Curci
Decebal Constantin Mocanu
Mykola Pechenizkiy
141
22
0
02 Feb 2021
A Runtime-Based Computational Performance Predictor for Deep Neural
  Network Training
A Runtime-Based Computational Performance Predictor for Deep Neural Network Training
Geoffrey X. Yu
Yubo Gao
P. Golikov
Gennady Pekhimenko
3DH
69
68
0
31 Jan 2021
Parallel Iterated Extended and Sigma-point Kalman Smoothers
Parallel Iterated Extended and Sigma-point Kalman Smoothers
F. Yaghoobi
Adrien Corenflos
Sakira Hassan
Simo Särkkä
39
13
0
31 Jan 2021
A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration
  in Resource-Limited Edge Computing Applications?
A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration in Resource-Limited Edge Computing Applications?
Ian Colbert
Jake Daly
Ken Kreutz-Delgado
Srinjoy Das
49
13
0
30 Jan 2021
Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators
Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators
Hamzah Abdel-Aziz
Ali Shafiee
J. Shin
A. Pedram
Joseph Hassoun
MQ
72
11
0
27 Jan 2021
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models
Chunxing Yin
Bilge Acun
Xing Liu
Carole-Jean Wu
99
106
0
25 Jan 2021
AdderNet and its Minimalist Hardware Design for Energy-Efficient
  Artificial Intelligence
AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence
Yunhe Wang
Mingqiang Huang
Kai Han
Hanting Chen
Wei Zhang
Chunjing Xu
Dacheng Tao
107
36
0
25 Jan 2021
Pruning and Quantization for Deep Neural Network Acceleration: A Survey
Pruning and Quantization for Deep Neural Network Acceleration: A Survey
Tailin Liang
C. Glossner
Lei Wang
Shaobo Shi
Xiaotong Zhang
MQ
250
710
0
24 Jan 2021
MinConvNets: A new class of multiplication-less Neural Networks
MinConvNets: A new class of multiplication-less Neural Networks
Xuecan Yang
S. Chaudhuri
Laurence Likforman
L. Naviner
21
0
0
23 Jan 2021
Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir
  Computing
Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing
Matthew Denton
H. Schmit
101
2
0
21 Jan 2021
Clairvoyant Prefetching for Distributed Machine Learning I/O
Clairvoyant Prefetching for Distributed Machine Learning I/O
Nikoli Dryden
Roman Böhringer
Tal Ben-Nun
Torsten Hoefler
79
58
0
21 Jan 2021
Accelerating Deep Learning Inference via Learned Caches
Accelerating Deep Learning Inference via Learned Caches
Arjun Balasubramanian
Adarsh Kumar
Yuhan Liu
Han Cao
Shivaram Venkataraman
Aditya Akella
64
19
0
18 Jan 2021
NNStreamer: Efficient and Agile Development of On-Device AI Systems
NNStreamer: Efficient and Agile Development of On-Device AI Systems
MyungJoo Ham
Jijoong Moon
Geunsik Lim
Jaeyun Jung
Hyoungjoo Ahn
...
Parichay Kapoor
Dongju Chae
Gichan Jang
Y. Ahn
Jihoon Lee
67
6
0
16 Jan 2021
STENCIL-NET: Data-driven solution-adaptive discretization of partial
  differential equations
STENCIL-NET: Data-driven solution-adaptive discretization of partial differential equations
Suryanarayana Maddu
D. Sturm
B. Cheeseman
Christian L. Müller
I. Sbalzarini
47
8
0
15 Jan 2021
Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling
  GEMM Acceleration
Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration
A. Samajdar
Michael Pellauer
T. Krishna
87
4
0
12 Jan 2021
TensorX: Extensible API for Neural Network Model Design and Deployment
TensorX: Extensible API for Neural Network Model Design and Deployment
Davide Nunes
Luis M. Antunes
31
0
0
29 Dec 2020
SimBricks: End-to-End Network System Evaluation with Modular Simulation
SimBricks: End-to-End Network System Evaluation with Modular Simulation
Hejing Li
Jialin Li
Antoine Kaufmann
18
20
0
28 Dec 2020
Assured RL: Reinforcement Learning with Almost Sure Constraints
Assured RL: Reinforcement Learning with Almost Sure Constraints
Agustin Castellano
J. Bazerque
Enrique Mallada
40
1
0
24 Dec 2020
AutonoML: Towards an Integrated Framework for Autonomous Machine
  Learning
AutonoML: Towards an Integrated Framework for Autonomous Machine Learning
D. Kedziora
Katarzyna Musial
Bogdan Gabrys
90
17
0
23 Dec 2020
Hardware and Software Optimizations for Accelerating Deep Neural
  Networks: Survey of Current Trends, Challenges, and the Road Ahead
Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead
Maurizio Capra
Beatrice Bussolino
Alberto Marchisio
Guido Masera
Maurizio Martina
Mohamed Bennai
BDL
129
147
0
21 Dec 2020
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and
  Head Pruning
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Hanrui Wang
Zhekai Zhang
Song Han
156
399
0
17 Dec 2020
Real-time Multi-Task Diffractive Deep Neural Networks via
  Hardware-Software Co-design
Real-time Multi-Task Diffractive Deep Neural Networks via Hardware-Software Co-design
Yingjie Li
Ruiyang Chen
B. S. Rodriguez
Weilu Gao
Cunxi Yu
75
4
0
16 Dec 2020
A hybrid quantum-classical neural network with deep residual learning
A hybrid quantum-classical neural network with deep residual learning
Yanying Liang
Wei Peng
Zhu-Jun Zheng
Olli Silvén
Guoying Zhao
62
48
0
14 Dec 2020
Neighbors From Hell: Voltage Attacks Against Deep Learning Accelerators
  on Multi-Tenant FPGAs
Neighbors From Hell: Voltage Attacks Against Deep Learning Accelerators on Multi-Tenant FPGAs
Andrew Boutros
Mathew Hall
Nicolas Papernot
Vaughn Betz
60
41
0
14 Dec 2020
Less Is More: Improved RNN-T Decoding Using Limited Label Context and
  Path Merging
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
Rohit Prabhavalkar
Yanzhang He
David Rybach
S. Campbell
A. Narayanan
Trevor Strohman
Tara N. Sainath
125
35
0
12 Dec 2020
Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct
  Feedback Alignment
Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment
Julien Launay
Iacopo Poli
Kilian Muller
Gustave Pariente
I. Carron
L. Daudet
Florent Krzakala
S. Gigan
MoE
75
18
0
11 Dec 2020
Imitating Interactive Intelligence
Imitating Interactive Intelligence
Josh Abramson
Arun Ahuja
Iain Barr
Arthur Brussee
Federico Carnevale
...
Greg Wayne
Duncan Williams
Nathaniel Wong
Chen Yan
Rui Zhu
LM&Ro
91
71
0
10 Dec 2020
The Why, What and How of Artificial General Intelligence Chip
  Development
The Why, What and How of Artificial General Intelligence Chip Development
Alex P. James
71
23
0
08 Dec 2020
Real-Time Formal Verification of Autonomous Systems With An FPGA
Real-Time Formal Verification of Autonomous Systems With An FPGA
Minh Bui
Michael Lu
Reza Hojabr
Mo Chen
Arrvindh Shriraman
26
4
0
07 Dec 2020
Monadic Pavlovian associative learning in a backpropagation-free
  photonic network
Monadic Pavlovian associative learning in a backpropagation-free photonic network
James Y. S. Tan
Zengguang Cheng
J. Feldmann
Xuan Li
Nathan Youngblood
U. E. Ali
David Wright
W. Pernice
H. Bhaskaran
48
14
0
30 Nov 2020
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
  Multi-Task NLP Inference
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference
Thierry Tambe
Coleman Hooper
Lillian Pentecost
Tianyu Jia
En-Yu Yang
...
Victor Sanh
P. Whatmough
Alexander M. Rush
David Brooks
Gu-Yeon Wei
112
126
0
28 Nov 2020
Previous
123...111213...222324
Next