ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04760
  4. Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
ArXiv (abs)PDFHTML

Papers citing "In-Datacenter Performance Analysis of a Tensor Processing Unit"

50 / 1,167 papers shown
Title
Towards Efficient Full 8-bit Integer DNN Online Training on
  Resource-limited Devices without Batch Normalization
Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch Normalization
Yukuan Yang
Xiaowei Chi
Lei Deng
Tianyi Yan
Feng Gao
Guoqi Li
MQ
74
6
0
27 May 2021
PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
Tianyi Zhang
Jie Lin
Peng Hu
Bin Zhao
M. Aly
45
5
0
27 May 2021
CARLS: Cross-platform Asynchronous Representation Learning System
CARLS: Cross-platform Asynchronous Representation Learning System
Chun-Ta Lu
Yun Zeng
Da-Cheng Juan
Yicheng Fan
Zhe Li
...
Ariel Fuxman
Futang Peng
Zhen Li
Tom Duerig
Andrew Tomkins
25
0
0
26 May 2021
A Full-Stack Search Technique for Domain Optimized Deep Learning
  Accelerators
A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators
Dan Zhang
Safeen Huda
Ebrahim M. Songhori
Kartik Prabhu
Quoc V. Le
Anna Goldie
Azalia Mirhoseini
94
53
0
26 May 2021
Low-Precision Hardware Architectures Meet Recommendation Model Inference
  at Scale
Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale
Zhaoxia Deng
Deng
Jongsoo Park
P. T. P. Tang
Haixin Liu
...
S. Nadathur
Changkyu Kim
Maxim Naumov
S. Naghshineh
M. Smelyanskiy
59
11
0
26 May 2021
FENXI: Deep-learning Traffic Analytics at the Edge
FENXI: Deep-learning Traffic Analytics at the Edge
Massimo Gallo
A. Finamore
G. Simon
Dario Rossi
34
7
0
25 May 2021
GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific
  Caching
GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching
Sudipta Mondal
Susmita Dey Manasi
K. Kunal
S. Ramprasath
S. Sapatnekar
GNN
53
15
0
21 May 2021
RecPipe: Co-designing Models and Hardware to Jointly Optimize
  Recommendation Quality and Performance
RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance
Udit Gupta
Samuel Hsia
J. Zhang
Mark Wilkening
Javin Pombra
Hsien-Hsin S. Lee
Gu-Yeon Wei
Carole-Jean Wu
David Brooks
67
33
0
18 May 2021
SimNet: Accurate and High-Performance Computer Architecture Simulation
  using Deep Learning
SimNet: Accurate and High-Performance Computer Architecture Simulation using Deep Learning
Lingda Li
Santosh Pandey
T. Flynn
Hang Liu
Noel Wheeler
A. Hoisie
42
8
0
12 May 2021
PIM-DRAM: Accelerating Machine Learning Workloads using Processing in
  Commodity DRAM
PIM-DRAM: Accelerating Machine Learning Workloads using Processing in Commodity DRAM
Sourjya Roy
M. Ali
A. Raghunathan
19
19
0
08 May 2021
Neural network architectures using min-plus algebra for solving certain
  high dimensional optimal control problems and Hamilton-Jacobi PDEs
Neural network architectures using min-plus algebra for solving certain high dimensional optimal control problems and Hamilton-Jacobi PDEs
Jérome Darbon
P. Dower
Tingwei Meng
48
22
0
07 May 2021
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Qijing Huang
Minwoo Kang
Grace Dinh
Thomas Norell
Aravind Kalaiah
J. Demmel
J. Wawrzynek
Y. Shao
72
112
0
05 May 2021
Modulating Regularization Frequency for Efficient Compression-Aware
  Model Training
Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Dongsoo Lee
S. Kwon
Byeongwook Kim
Jeongin Yun
Baeseong Park
Yongkweon Jeon
36
0
0
05 May 2021
HASCO: Towards Agile HArdware and Software CO-design for Tensor
  Computation
HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation
Qingcheng Xiao
Wenlei Bao
Bingzhe Wu
Pengcheng Xu
Xuehai Qian
Yun Liang
124
69
0
04 May 2021
Connecting AI Learning and Blockchain Mining in 6G Systems
Connecting AI Learning and Blockchain Mining in 6G Systems
Yunkai Wei
Zixian An
S. Leng
Kun Yang
27
1
0
29 Apr 2021
MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple
  Accelerator Cores
MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores
Sheng-Chun Kao
T. Krishna
123
52
0
28 Apr 2021
An optical neural network using less than 1 photon per multiplication
An optical neural network using less than 1 photon per multiplication
Tianyu Wang
Shifan Ma
Logan G. Wright
Tatsuhiro Onodera
Brian C. Richard
Peter L. McMahon
105
185
0
27 Apr 2021
Efficient training of physics-informed neural networks via importance
  sampling
Efficient training of physics-informed neural networks via importance sampling
M. A. Nabian
R. J. Gladstone
Hadi Meidani
DiffMPINN
135
239
0
26 Apr 2021
Measuring what Really Matters: Optimizing Neural Networks for TinyML
Measuring what Really Matters: Optimizing Neural Networks for TinyML
Lennart Heim
Andreas Biri
Zhongnan Qu
Lothar Thiele
79
30
0
21 Apr 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViTVGen
314
513
0
20 Apr 2021
DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device
DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device
Mario Almeida
Stefanos Laskaridis
Stylianos I. Venieris
Ilias Leontiadis
Nicholas D. Lane
75
37
0
20 Apr 2021
CoDR: Computation and Data Reuse Aware CNN Accelerator
CoDR: Computation and Data Reuse Aware CNN Accelerator
Alireza Khadem
Haojie Ye
T. Mudge
14
0
0
20 Apr 2021
End-to-End Jet Classification of Boosted Top Quarks with the CMS Open
  Data
End-to-End Jet Classification of Boosted Top Quarks with the CMS Open Data
Michael Andrews
Bjorn Burkle
Yi-fan Chen
Davide DiCroce
S. Gleyzer
...
N. Pervan
Yusef Shafi
Wei-Ju Sun
Emanuele Usai
Kun Yang
68
10
0
19 Apr 2021
Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference
  on GPUs
Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs
J. Kosaian
K. V. Rashmi
72
34
0
19 Apr 2021
Learning on Hardware: A Tutorial on Neural Network Accelerators and
  Co-Processors
Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors
Lukas Baischer
M. Wess
N. Taherinejad
85
13
0
19 Apr 2021
RingCNN: Exploiting Algebraically-Sparse Ring Tensors for
  Energy-Efficient CNN-Based Computational Imaging
RingCNN: Exploiting Algebraically-Sparse Ring Tensors for Energy-Efficient CNN-Based Computational Imaging
Chao-Tsung Huang
92
10
0
19 Apr 2021
Demystifying BERT: Implications for Accelerator Design
Demystifying BERT: Implications for Accelerator Design
Suchita Pati
Shaizeen Aga
Nuwan Jayasena
Matthew D. Sinclair
LLMAG
88
17
0
14 Apr 2021
Mitigating Adversarial Attack for Compute-in-Memory Accelerator
  Utilizing On-chip Finetune
Mitigating Adversarial Attack for Compute-in-Memory Accelerator Utilizing On-chip Finetune
Shanshi Huang
Hongwu Jiang
Shimeng Yu
AAML
54
3
0
13 Apr 2021
Podracer architectures for scalable Reinforcement Learning
Podracer architectures for scalable Reinforcement Learning
Matteo Hessel
M. Kroiss
Aidan Clark
Iurii Kemaev
John Quan
Thomas Keck
Fabio Viola
H. V. Hasselt
66
39
0
13 Apr 2021
Optimizing the Whole-life Cost in End-to-end CNN Acceleration
Optimizing the Whole-life Cost in End-to-end CNN Acceleration
Jiaqi Zhang
Xiangru Chen
S. Ray
32
8
0
12 Apr 2021
Deep Learning and Traffic Classification: Lessons learned from a
  commercial-grade dataset with hundreds of encrypted and zero-day applications
Deep Learning and Traffic Classification: Lessons learned from a commercial-grade dataset with hundreds of encrypted and zero-day applications
Lixuan Yang
A. Finamore
Feng Jun
Dario Rossi
27
50
0
07 Apr 2021
A matrix math facility for Power ISA(TM) processors
A matrix math facility for Power ISA(TM) processors
José Moreira
Kit Barton
Steven J. Battle
Peter Bergner
Ramon Bertran Monfort
...
Rajalakshmi Srinivasaraghavan
Shricharan Srivatsan
Brian W. Thompto
Andreas Wagner
Nelson Wu
11
14
0
07 Apr 2021
GPU Domain Specialization via Composable On-Package Architecture
GPU Domain Specialization via Composable On-Package Architecture
Yaosheng Fu
Evgeny Bolotin
Niladrish Chatterjee
D. Nellans
S. Keckler
27
13
0
05 Apr 2021
Tight Compression: Compressing CNN Through Fine-Grained Pruning and
  Weight Permutation for Efficient Implementation
Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation
Xizi Chen
Jingyang Zhu
Jingbo Jiang
Chi-Ying Tsui
37
12
0
03 Apr 2021
Exploring Edge TPU for Network Intrusion Detection in IoT
Exploring Edge TPU for Network Intrusion Detection in IoT
Seyedehfaezeh Hosseininoorbin
S. Layeghy
Mohanad Sarhan
Raja Jurdak
Marius Portmann
39
22
0
30 Mar 2021
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance
  Fields
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
Jonathan T. Barron
B. Mildenhall
Matthew Tancik
Peter Hedman
Ricardo Martín Brualla
Pratul P. Srinivasan
132
1,994
0
24 Mar 2021
FastMoE: A Fast Mixture-of-Expert Training System
FastMoE: A Fast Mixture-of-Expert Training System
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
ALMMoE
109
104
0
24 Mar 2021
Hardware Acceleration of Explainable Machine Learning using Tensor
  Processing Units
Hardware Acceleration of Explainable Machine Learning using Tensor Processing Units
Zhixin Pan
Prabhat Mishra
69
18
0
22 Mar 2021
Extending Sparse Tensor Accelerators to Support Multiple Compression
  Formats
Extending Sparse Tensor Accelerators to Support Multiple Compression Formats
Eric Qin
Geonhwa Jeong
William Won
Sheng-Chun Kao
Hyoukjun Kwon
Sudarshan Srinivasan
Dipankar Das
G. Moon
S. Rajamanickam
T. Krishna
65
19
0
18 Mar 2021
Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows
  on Spatial Accelerators
Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators
Raveesh Garg
Eric Qin
Francisco Munoz-Martínez
Robert Guirado
Akshay Jain
...
José L. Abellán
M. Acacio
Eduard Alarcón
S. Rajamanickam
T. Krishna
GNN
26
19
0
14 Mar 2021
Revisiting ResNets: Improved Training and Scaling Strategies
Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello
W. Fedus
Xianzhi Du
E. D. Cubuk
A. Srinivas
Nayeon Lee
Jonathon Shlens
Barret Zoph
98
302
0
13 Mar 2021
The Old and the New: Can Physics-Informed Deep-Learning Replace
  Traditional Linear Solvers?
The Old and the New: Can Physics-Informed Deep-Learning Replace Traditional Linear Solvers?
Stefano Markidis
PINN
75
193
0
12 Mar 2021
Performance of a Geometric Deep Learning Pipeline for HL-LHC Particle
  Tracking
Performance of a Geometric Deep Learning Pipeline for HL-LHC Particle Tracking
X. Ju
D. Murnane
P. Calafiura
Nicholas Choma
S. Conlon
...
Aditi Chauhan
A. Schuy
Shih-Chieh Hsu
A. Ballow
A. Lazar
59
65
0
11 Mar 2021
Proof-of-Learning: Definitions and Practice
Proof-of-Learning: Definitions and Practice
Hengrui Jia
Mohammad Yaghini
Christopher A. Choquette-Choo
Natalie Dullerud
Anvith Thudi
Varun Chandrasekaran
Nicolas Papernot
AAML
86
106
0
09 Mar 2021
F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar
  Decoding
F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding
Xiaofan Zhang
Dawei Wang
P. Chuang
Shugao Ma
Deming Chen
Yuecheng Li
VGen
70
10
0
08 Mar 2021
Reliability-Aware Quantization for Anti-Aging NPUs
Reliability-Aware Quantization for Anti-Aging NPUs
Sami Salamin
Georgios Zervakis
Ourania Spantidi
Iraklis Anagnostopoulos
J. Henkel
H. Amrouch
25
13
0
08 Mar 2021
ShEF: Shielded Enclaves for Cloud FPGAs
ShEF: Shielded Enclaves for Cloud FPGAs
Mark Zhao
Mingyu Gao
Christos Kozyrakis
74
57
0
05 Mar 2021
BM3D vs 2-Layer ONN
BM3D vs 2-Layer ONN
Junaid Malik
S. Kiranyaz
Mehmet Yamaç
Moncef Gabbouj
50
11
0
04 Mar 2021
Sparse Training Theory for Scalable and Efficient Agents
Sparse Training Theory for Scalable and Efficient Agents
Decebal Constantin Mocanu
Elena Mocanu
T. Pinto
Selima Curci
Phuong H. Nguyen
M. Gibescu
D. Ernst
Z. Vale
80
18
0
02 Mar 2021
Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space
  Search
Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search
Kartik Hegde
Po-An Tsai
Sitao Huang
Vikas Chandra
A. Parashar
Christopher W. Fletcher
72
97
0
02 Mar 2021
Previous
123...101112...222324
Next