ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04760
  4. Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
ArXiv (abs)PDFHTML

Papers citing "In-Datacenter Performance Analysis of a Tensor Processing Unit"

50 / 1,167 papers shown
Title
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
183
1,199
0
30 Jun 2020
Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing
  GPUs
Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs
Ang Li
Simon Su
MQ
83
35
0
30 Jun 2020
Efficient Algorithms for Device Placement of DNN Graph Operators
Efficient Algorithms for Device Placement of DNN Graph Operators
Jakub Tarnawski
Amar Phanishayee
Nikhil R. Devanur
Divya Mahajan
Fanny Nina Paravecino
112
66
0
29 Jun 2020
DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on
  Systolic Accelerator
DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on Systolic Accelerator
N. Jha
Shreyas Ravishankar
Sparsh Mittal
Arvind Kaushik
D. Mandal
M. Chandra
43
9
0
26 Jun 2020
On the Difficulty of Designing Processor Arrays for Deep Neural Networks
On the Difficulty of Designing Processor Arrays for Deep Neural Networks
Kevin Stehle
Günther Schindler
Holger Fröning
24
0
0
24 Jun 2020
Inference with Artificial Neural Networks on Analog Neuromorphic
  Hardware
Inference with Artificial Neural Networks on Analog Neuromorphic Hardware
Johannes Weis
Philipp Spilger
Sebastian Billaudelle
Yannik Stradmann
Arne Emmel
...
V. Karasenko
Mitja Kleider
Christian Mauch
Korbinian Schreiber
Johannes Schemmel
63
10
0
23 Jun 2020
Similarity Search with Tensor Core Units
Similarity Search with Tensor Core Units
Thomas Dybdahl Ahle
Francesco Silvestri
23
8
0
22 Jun 2020
Quantum Computing Methods for Supervised Learning
Quantum Computing Methods for Supervised Learning
V. Kulkarni
Milind Kulkarni
Aniruddha Pant
36
29
0
22 Jun 2020
Evaluating Prediction-Time Batch Normalization for Robustness under
  Covariate Shift
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift
Zachary Nado
Shreyas Padhy
D. Sculley
Alexander DÁmour
Balaji Lakshminarayanan
Jasper Snoek
OODAI4TS
126
251
0
19 Jun 2020
Caffe Barista: Brewing Caffe with FPGAs in the Training Loop
Caffe Barista: Brewing Caffe with FPGAs in the Training Loop
D. A. Vink
A. Rajagopal
Stylianos I. Venieris
C. Bouganis
BDL
36
7
0
18 Jun 2020
A Review of 1D Convolutional Neural Networks toward Unknown Substance
  Identification in Portable Raman Spectrometer
A Review of 1D Convolutional Neural Networks toward Unknown Substance Identification in Portable Raman Spectrometer
Mohammad Mozaffari
L. Tay
34
21
0
18 Jun 2020
Efficient Execution of Quantized Deep Learning Models: A Compiler
  Approach
Efficient Execution of Quantized Deep Learning Models: A Compiler Approach
Animesh Jain
Shoubhik Bhattacharya
Masahiro Masuda
Vin Sharma
Yida Wang
MQ
92
34
0
18 Jun 2020
Dynamic Tensor Rematerialization
Dynamic Tensor Rematerialization
Marisa Kirisame
Steven Lyubomirsky
Altan Haan
Jennifer Brennan
Mike He
Jared Roesch
Tianqi Chen
Zachary Tatlock
92
94
0
17 Jun 2020
Memory-Efficient Pipeline-Parallel DNN Training
Memory-Efficient Pipeline-Parallel DNN Training
Deepak Narayanan
Amar Phanishayee
Kaiyu Shi
Xie Chen
Matei A. Zaharia
MoE
105
219
0
16 Jun 2020
Logically Synthesized, Hardware-Accelerated, Restricted Boltzmann
  Machines for Combinatorial Optimization and Integer Factorization
Logically Synthesized, Hardware-Accelerated, Restricted Boltzmann Machines for Combinatorial Optimization and Integer Factorization
Saavan Patel
Philip Canoza
Sayeef Salahuddin
16
38
0
16 Jun 2020
Multi-Precision Policy Enforced Training (MuPPET): A precision-switching
  strategy for quantised fixed-point training of CNNs
Multi-Precision Policy Enforced Training (MuPPET): A precision-switching strategy for quantised fixed-point training of CNNs
A. Rajagopal
D. A. Vink
Stylianos I. Venieris
C. Bouganis
MQ
81
15
0
16 Jun 2020
SECure: A Social and Environmental Certificate for AI Systems
SECure: A Social and Environmental Certificate for AI Systems
Abhishek Gupta
Camylle Lanteigne
Sara Kingsley
61
13
0
11 Jun 2020
STONNE: A Detailed Architectural Simulator for Flexible Neural Network
  Accelerators
STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators
Francisco Munoz-Martínez
José L. Abellán
M. Acacio
T. Krishna
46
11
0
10 Jun 2020
Making Convolutions Resilient via Algorithm-Based Error Detection
  Techniques
Making Convolutions Resilient via Algorithm-Based Error Detection Techniques
S. Hari
Michael B. Sullivan
Timothy Tsai
S. Keckler
53
67
0
08 Jun 2020
EDCompress: Energy-Aware Model Compression for Dataflows
EDCompress: Energy-Aware Model Compression for Dataflows
Zhehui Wang
Yaoyu Zhang
Qiufeng Wang
Rick Siow Mong Goh
52
2
0
08 Jun 2020
Generative Design of Hardware-aware DNNs
Generative Design of Hardware-aware DNNs
Sheng-Chun Kao
Arun Ramamurthy
T. Krishna
MQ
37
2
0
06 Jun 2020
High-level Modeling of Manufacturing Faults in Deep Neural Network
  Accelerators
High-level Modeling of Manufacturing Faults in Deep Neural Network Accelerators
Shamik Kundu
Ahmet Soyyiğit
K. A. Hoque
K. Basu
37
11
0
05 Jun 2020
Sponge Examples: Energy-Latency Attacks on Neural Networks
Sponge Examples: Energy-Latency Attacks on Neural Networks
Ilia Shumailov
Yiren Zhao
Daniel Bates
Nicolas Papernot
Robert D. Mullins
Ross J. Anderson
SILM
81
138
0
05 Jun 2020
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN
  Training
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training
Hongyu Zhu
Amar Phanishayee
Gennady Pekhimenko
145
50
0
05 Jun 2020
Exploring the Potential of Low-bit Training of Convolutional Neural
  Networks
Exploring the Potential of Low-bit Training of Convolutional Neural Networks
Kai Zhong
Xuefei Ning
Guohao Dai
Zhenhua Zhu
Tianchen Zhao
Shulin Zeng
Yu Wang
Huazhong Yang
MQ
75
9
0
04 Jun 2020
Serving DNNs like Clockwork: Performance Predictability from the Bottom
  Up
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up
A. Gujarati
Reza Karimi
Safya Alzayat
Wei Hao
Antoine Kaufmann
Ymir Vigfusson
Jonathan Mace
115
284
0
03 Jun 2020
Light-in-the-loop: using a photonics co-processor for scalable training
  of neural networks
Light-in-the-loop: using a photonics co-processor for scalable training of neural networks
Julien Launay
Iacopo Poli
Kilian Muller
I. Carron
L. Daudet
Florent Krzakala
S. Gigan
70
6
0
02 Jun 2020
PolyDL: Polyhedral Optimizations for Creation of High Performance DL
  primitives
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives
Sanket Tavarageri
A. Heinecke
Sasikanth Avancha
Gagandeep Goyal
Ramakrishna Upadrasta
Bharat Kaul
38
14
0
02 Jun 2020
Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions
  on the Xilinx AI Engine
Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions on the Xilinx AI Engine
Prasanth Chatarasi
S. Neuendorffer
Samuel Bayliss
K. Vissers
Vivek Sarkar
23
19
0
02 Jun 2020
SiEVE: Semantically Encoded Video Analytics on Edge and Cloud
SiEVE: Semantically Encoded Video Analytics on Edge and Cloud
Tarek Elgamal
Shu Shi
Varun Gupta
R. Jana
Klara Nahrstedt
16
16
0
01 Jun 2020
Artificial neural networks for neuroscientists: A primer
Artificial neural networks for neuroscientists: A primer
G. R. Yang
Xiao-Jing Wang
107
255
0
01 Jun 2020
Climbing down Charney's ladder: Machine Learning and the post-Dennard
  era of computational climate science
Climbing down Charney's ladder: Machine Learning and the post-Dennard era of computational climate science
Venkatramani Balaji
AI4CE
108
50
0
24 May 2020
HyperLogLog Sketch Acceleration on FPGA
HyperLogLog Sketch Acceleration on FPGA
Amit Kulkarni
Monica Chiosa
Thomas B. Preußer
Kaan Kara
David Sidler
Gustavo Alonso
15
20
0
24 May 2020
Conditionally Deep Hybrid Neural Networks Across Edge and Cloud
Conditionally Deep Hybrid Neural Networks Across Edge and Cloud
Yinghan Long
I. Chakraborty
Kaushik Roy
29
4
0
21 May 2020
Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator
  for Mobile CNN Inference
Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference
Zhi-Gang Liu
P. Whatmough
Matthew Mattina
73
82
0
16 May 2020
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed
  Training
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training
Yemao Xu
Dezun Dong
Weixia Xu
Xiangke Liao
47
7
0
14 May 2020
Deep Learning: Our Miraculous Year 1990-1991
Deep Learning: Our Miraculous Year 1990-1991
J. Schmidhuber
3DGSMedIm
33
6
0
12 May 2020
Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for
  Personalized Recommendations
Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Ranggi Hwang
Taehun Kim
Youngeun Kwon
Minsoo Rhu
78
107
0
12 May 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
  Architectures
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
Sudarshan Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
96
48
0
10 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy
  Efficient Inference
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
80
190
0
08 May 2020
One-step regression and classification with crosspoint resistive memory
  arrays
One-step regression and classification with crosspoint resistive memory arrays
Zhong Sun
Giacomo Pedretti
A. Bricalli
Daniele Ielmini
37
77
0
05 May 2020
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for
  Neural Network Acceleration
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
Behzad Salami
Erhan Baturay Onural
Ismail Emir Yüksel
Fahrettin Koc
Oguz Ergin
A. Cristal
O. Unsal
H. Sarbazi-Azad
O. Mutlu
73
45
0
04 May 2020
Spiking Neural Networks Hardware Implementations and Challenges: a
  Survey
Spiking Neural Networks Hardware Implementations and Challenges: a Survey
Maxence Bouvier
A. Valentian
T. Mesquida
F. Rummens
M. Reyboz
Elisa Vianello
E. Beigné
90
118
0
04 May 2020
TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators
  Towards Local and in Time Domain
TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain
Weitao Li
Pengfei Xu
Yang Zhao
Haitong Li
Yuan Xie
Yingyan Lin
64
70
0
03 May 2020
Lupulus: A Flexible Hardware Accelerator for Neural Networks
Lupulus: A Flexible Hardware Accelerator for Neural Networks
Andreas Toftegaard Kristensen
R. Giterman
Alexios Balatsoukas-Stimming
A. Burg
38
0
0
03 May 2020
AIBench Training: Balanced Industry-Standard AI Training Benchmarking
AIBench Training: Balanced Industry-Standard AI Training Benchmarking
Fei Tang
Wanling Gao
Jianfeng Zhan
Chuanxin Lan
Xu Wen
...
Yatao Li
Junchao Shao
Zhenyu Wang
Xiaoyu Wang
Hainan Ye
43
3
0
30 Apr 2020
Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra
Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra
M. Soltaniyeh
R. Martin
Santosh Nagarakatte
36
8
0
29 Apr 2020
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN
  Model Training
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training
Sangkug Lym
M. Erez
36
26
0
27 Apr 2020
Memory-efficient training with streaming dimensionality reduction
Memory-efficient training with streaming dimensionality reduction
Siyuan Huang
Brian D. Hoskins
M. Daniels
M. D. Stiles
G. Adam
37
3
0
25 Apr 2020
PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal
  Matrices
PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices
Chunhua Deng
Siyu Liao
Yi Xie
Keshab K. Parhi
Xuehai Qian
Bo Yuan
83
93
0
23 Apr 2020
Previous
123...141516...222324
Next