ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04760
  4. Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
ArXivPDFHTML

Papers citing "In-Datacenter Performance Analysis of a Tensor Processing Unit"

50 / 1,165 papers shown
Title
Boosting Retailer Revenue by Generated Optimized Combined Multiple
  Digital Marketing Campaigns
Boosting Retailer Revenue by Generated Optimized Combined Multiple Digital Marketing Campaigns
Yafei Xu
Tian Xie
Yu Zhang
21
1
0
09 Sep 2020
An FPGA Accelerated Method for Training Feed-forward Neural Networks
  Using Alternating Direction Method of Multipliers and LSMR
An FPGA Accelerated Method for Training Feed-forward Neural Networks Using Alternating Direction Method of Multipliers and LSMR
Seyedeh Niusha Alavi Foumani
Ce Guo
Wayne Luk
22
3
0
06 Sep 2020
Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration
Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration
Zhi-Gang Liu
P. Whatmough
Matthew Mattina
15
12
0
04 Sep 2020
Running Neural Networks on the NIC
Running Neural Networks on the NIC
G. Siracusano
Salvator Galea
D. Sanvito
Mohammad Malekzadeh
Hamed Haddadi
G. Antichi
R. Bifulco
19
25
0
04 Sep 2020
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators
  using Reinforcement Learning
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
Sheng-Chun Kao
Geonhwa Jeong
T. Krishna
28
95
0
04 Sep 2020
Layer-specific Optimization for Mixed Data Flow with Mixed Precision in
  FPGA Design for CNN-based Object Detectors
Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors
Duy-Thanh Nguyen
Hyun Kim
Hyuk-Jae Lee
MQ
25
60
0
03 Sep 2020
Transform Quantization for CNN (Convolutional Neural Network)
  Compression
Transform Quantization for CNN (Convolutional Neural Network) Compression
Sean I. Young
Wang Zhe
David S. Taubman
B. Girod
MQ
36
69
0
02 Sep 2020
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network
  Training and Inference
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference
Mostafa Mahmoud
Isak Edo Vivancos
Ali Hadi Zadeh
Omar Mohamed Awad
Gennady Pekhimenko
Jorge Albericio
Andreas Moshovos
MoE
26
59
0
01 Sep 2020
POSEIDON: Privacy-Preserving Federated Neural Network Learning
POSEIDON: Privacy-Preserving Federated Neural Network Learning
Sinem Sav
Apostolos Pyrgelis
J. Troncoso-Pastoriza
D. Froelicher
Jean-Philippe Bossuat
João Sá Sousa
Jean-Pierre Hubaux
FedML
21
153
0
01 Sep 2020
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise
  Sparsity
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Cong Guo
B. Hsueh
Jingwen Leng
Yuxian Qiu
Yue Guan
Zehuan Wang
Xiaoying Jia
Xipeng Li
Minyi Guo
Yuhao Zhu
35
83
0
29 Aug 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep
  Learning
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
16
179
0
27 Aug 2020
On the Intrinsic Robustness of NVM Crossbars Against Adversarial Attacks
On the Intrinsic Robustness of NVM Crossbars Against Adversarial Attacks
Deboleena Roy
I. Chakraborty
Timur Ibrayev
Kaushik Roy
AAML
19
4
0
27 Aug 2020
CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity
  Edge Devices
CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices
Parth Mannan
A. Samajdar
T. Krishna
31
2
0
27 Aug 2020
GuardNN: Secure Accelerator Architecture for Privacy-Preserving Deep
  Learning
GuardNN: Secure Accelerator Architecture for Privacy-Preserving Deep Learning
Weizhe Hua
M. Umar
Zhiru Zhang
G. E. Suh
FedML
24
29
0
26 Aug 2020
Tearing Down the Memory Wall
Tearing Down the Memory Wall
Zaid Qureshi
Vikram Sharma Mailthody
S. Min
I-Hsin Chung
Jinjun Xiong
Wen-mei W. Hwu
GNN
31
9
0
24 Aug 2020
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning
  Workloads
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
Deepak Narayanan
Keshav Santhanam
Fiodar Kazhamiaka
Amar Phanishayee
Matei A. Zaharia
9
204
0
20 Aug 2020
Training of mixed-signal optical convolutional neural network with
  reduced quantization level
Training of mixed-signal optical convolutional neural network with reduced quantization level
Joseph Ulseth
Zheyuan Zhu
Guifang Li
Shuo Pang
10
0
0
20 Aug 2020
Mesorasi: Architecture Support for Point Cloud Analytics via
  Delayed-Aggregation
Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation
Yu Feng
Boyuan Tian
Tiancheng Xu
P. Whatmough
Yuhao Zhu
3DPC
32
58
0
16 Aug 2020
Skyline: Interactive In-Editor Computational Performance Profiling for
  Deep Neural Network Training
Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training
Geoffrey X. Yu
Tovi Grossman
Gennady Pekhimenko
24
17
0
15 Aug 2020
SPINN: Synergistic Progressive Inference of Neural Networks over Device
  and Cloud
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Stefanos Laskaridis
Stylianos I. Venieris
Mario Almeida
Ilias Leontiadis
Nicholas D. Lane
30
266
0
14 Aug 2020
Weight Equalizing Shift Scaler-Coupled Post-training Quantization
Weight Equalizing Shift Scaler-Coupled Post-training Quantization
Jihun Oh
Sangjeong Lee
Meejeong Park
Pooni Walagaurav
K. Kwon
MQ
25
1
0
13 Aug 2020
tf-Darshan: Understanding Fine-grained I/O Performance in Machine
  Learning Workloads
tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads
Steven W. D. Chien
Artur Podobas
Ivy Bo Peng
Stefano Markidis
24
11
0
10 Aug 2020
Spatial Sharing of GPU for Autotuning DNN models
Spatial Sharing of GPU for Autotuning DNN models
Aditya Dhakal
Junguk Cho
Sameer G. Kulkarni
K. Ramakrishnan
P. Sharma
24
8
0
08 Aug 2020
A Learned Performance Model for Tensor Processing Units
A Learned Performance Model for Tensor Processing Units
Samuel J. Kaufman
P. Phothilimthana
Yanqi Zhou
Charith Mendis
Sudip Roy
Amit Sabne
Mike Burrows
26
8
0
03 Aug 2020
High Throughput Matrix-Matrix Multiplication between Asymmetric
  Bit-Width Operands
High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands
Dibakar Gope
Jesse G. Beu
Matthew Mattina
20
4
0
03 Aug 2020
Jointly Optimizing Preprocessing and Inference for DNN-based Visual
  Analytics
Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics
Daniel Kang
A. Mathur
Teja Veeramacheneni
Peter Bailis
Matei A. Zaharia
16
42
0
25 Jul 2020
Dopant Network Processing Units: Towards Efficient Neural-network
  Emulators with High-capacity Nanoelectronic Nodes
Dopant Network Processing Units: Towards Efficient Neural-network Emulators with High-capacity Nanoelectronic Nodes
Hans-Christian Ruiz-Euler
Unai Alegre Ibarra
B. van de Ven
H. Broersma
P. Bobbert
Wilfred G. van der Wiel
20
15
0
24 Jul 2020
T-BFA: Targeted Bit-Flip Adversarial Weight Attack
T-BFA: Targeted Bit-Flip Adversarial Weight Attack
Adnan Siraj Rakin
Zhezhi He
Jingtao Li
Fan Yao
C. Chakrabarti
Deliang Fan
AAML
22
13
0
24 Jul 2020
ZigZag: A Memory-Centric Rapid DNN Accelerator Design Space Exploration
  Framework
ZigZag: A Memory-Centric Rapid DNN Accelerator Design Space Exploration Framework
L. Mei
Pouya Houshmand
V. Jain
J. S. P. Giraldo
Marian Verhelst
8
20
0
22 Jul 2020
SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional
  Neural Networks Training
SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training
Pengcheng Dai
Jianlei Yang
Xucheng Ye
Xingzhou Cheng
Junyu Luo
Linghao Song
Yiran Chen
Weisheng Zhao
25
21
0
21 Jul 2020
SeqPoint: Identifying Representative Iterations of Sequence-based Neural
  Networks
SeqPoint: Identifying Representative Iterations of Sequence-based Neural Networks
Suchita Pati
Shaizeen Aga
Matthew D. Sinclair
Nuwan Jayasena
12
10
0
20 Jul 2020
Feature Pyramid Transformer
Feature Pyramid Transformer
Dong-Ming Zhang
Hanwang Zhang
Jinhui Tang
Meng Wang
Xiansheng Hua
Qianru Sun
ViT
30
251
0
18 Jul 2020
Discovering Reinforcement Learning Algorithms
Discovering Reinforcement Learning Algorithms
Junhyuk Oh
Matteo Hessel
Wojciech M. Czarnecki
Zhongwen Xu
H. V. Hasselt
Satinder Singh
David Silver
29
126
0
17 Jul 2020
DeepNetQoE: Self-adaptive QoE Optimization Framework of Deep Networks
DeepNetQoE: Self-adaptive QoE Optimization Framework of Deep Networks
Rui Wang
Min Chen
Nadra Guizani
Yong Li
H. Gharavi
Kai Hwang
26
18
0
17 Jul 2020
Optimizing Memory Placement using Evolutionary Graph Reinforcement
  Learning
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
Shauharda Khadka
Estelle Aflalo
Mattias Marder
Avrech Ben-David
Santiago Miret
Shie Mannor
Tamir Hazan
Hanlin Tang
Somdeb Majumdar
GNN
32
11
0
14 Jul 2020
Analyzing and Mitigating Data Stalls in DNN Training
Analyzing and Mitigating Data Stalls in DNN Training
Jayashree Mohan
Amar Phanishayee
Ashish Raniwala
Vijay Chidambaram
36
105
0
14 Jul 2020
SESAME: Software defined Enclaves to Secure Inference Accelerators with
  Multi-tenant Execution
SESAME: Software defined Enclaves to Secure Inference Accelerators with Multi-tenant Execution
Sarbartha Banerjee
Prakash Ramrakhyani
Shijia Wei
Mohit Tiwari
19
9
0
14 Jul 2020
ProtTrans: Towards Cracking the Language of Life's Code Through
  Self-Supervised Deep Learning and High Performance Computing
ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing
Ahmed Elnaggar
M. Heinzinger
Christian Dallago
Ghalia Rehawi
Yu Wang
...
Tamas B. Fehér
Christoph Angerer
Martin Steinegger
D. Bhowmik
B. Rost
DRL
20
917
0
13 Jul 2020
Optimizing Prediction Serving on Low-Latency Serverless Dataflow
Optimizing Prediction Serving on Low-Latency Serverless Dataflow
Vikram Sreekanti
Harikaran Subbaraj
Chenggang Wu
Joseph E. Gonzalez
J. M. Hellerstein
28
21
0
11 Jul 2020
Hardware Implementation of Deep Network Accelerators Towards Healthcare
  and Biomedical Applications
Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications
M. R. Azghadi
Corey Lammie
Jason K. Eshraghian
Melika Payvand
Elisa Donati
B. Linares-Barranco
Giacomo Indiveri
30
139
0
11 Jul 2020
HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point
  Operations for Convolutional Neural Networks
HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks
James Garland
David Gregg
8
0
0
11 Jul 2020
Challenges of AI in Wireless Networks for IoT
Challenges of AI in Wireless Networks for IoT
Ijaz Ahmad
Shahriar Shahabuddin
T. Kumar
E. Harjula
M. Meisel
M. Juntti
T. Sauter
M. Ylianttila
15
18
0
09 Jul 2020
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML
  Models: A Survey and Insights
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
Shail Dave
Riyadh Baghdadi
Tony Nowatzki
Sasikanth Avancha
Aviral Shrivastava
Baoxin Li
64
82
0
02 Jul 2020
Enabling Compute-Communication Overlap in Distributed Deep Learning
  Training Platforms
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
Saeed Rashidi
Matthew Denton
Srinivas Sridharan
Sudarshan Srinivasan
Amoghavarsha Suresh
Jade Nie
T. Krishna
34
45
0
30 Jun 2020
Deep neural networks for the evaluation and design of photonic devices
Deep neural networks for the evaluation and design of photonic devices
Jiaqi Jiang
Ming-Keh Chen
Jonathan A. Fan
27
394
0
30 Jun 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
Deep Feature Space: A Geometrical Perspective
Deep Feature Space: A Geometrical Perspective
Ioannis Kansizoglou
Loukas Bampis
Antonios Gasteratos
33
40
0
30 Jun 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhehuai Chen
MoE
43
1,118
0
30 Jun 2020
Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing
  GPUs
Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs
Ang Li
Simon Su
MQ
20
35
0
30 Jun 2020
Efficient Algorithms for Device Placement of DNN Graph Operators
Efficient Algorithms for Device Placement of DNN Graph Operators
Jakub Tarnawski
Amar Phanishayee
Nikhil R. Devanur
Divya Mahajan
Fanny Nina Paravecino
33
66
0
29 Jun 2020
Previous
123...131415...222324
Next