ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04760
  4. Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
ArXiv (abs)PDFHTML

Papers citing "In-Datacenter Performance Analysis of a Tensor Processing Unit"

50 / 1,167 papers shown
Title
Streaming Batch Eigenupdates for Hardware Neuromorphic Networks
Streaming Batch Eigenupdates for Hardware Neuromorphic Networks
Brian D. Hoskins
M. Daniels
Siyuan Huang
A. Madhavan
G. Adam
N. Zhitenev
Jabez J. McClelland
M. D. Stiles
57
14
0
05 Mar 2019
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer
  Learning
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning
P. Whatmough
Chuteng Zhou
Patrick Hansen
S. Venkataramanaiah
Jae-sun Seo
Matthew Mattina
73
58
0
27 Feb 2019
A Survey on Graph Processing Accelerators: Challenges and Opportunities
A Survey on Graph Processing Accelerators: Challenges and Opportunities
Chuangyi Gui
Long Zheng
Bingsheng He
Cheng Liu
Xinyu Chen
Xiaofei Liao
Hai Jin
GNN
128
71
0
26 Feb 2019
Learned Step Size Quantization
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
77
811
0
21 Feb 2019
DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on
  FPGA-based CNN Accelerators
DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators
Yu Xing
Shuang Liang
Lingzhi Sui
Xijie Jia
Jiantao Qiu
Xin Liu
Yushun Wang
Yu Wang
Yi Shan
78
71
0
20 Feb 2019
Low-bit Quantization of Neural Networks for Efficient Inference
Low-bit Quantization of Neural Networks for Efficient Inference
Yoni Choukroun
Eli Kravchik
Fan Yang
P. Kisilev
MQ
86
366
0
18 Feb 2019
Graph-RISE: Graph-Regularized Image Semantic Embedding
Graph-RISE: Graph-Regularized Image Semantic Embedding
Da-Cheng Juan
Chun-Ta Lu
Zerui Li
Futang Peng
Aleksei Timofeev
Yi-Ting Chen
Yaxi Gao
Tom Duerig
Andrew Tomkins
Sujith Ravi
83
40
0
14 Feb 2019
Salus: Fine-Grained GPU Sharing Primitives for Deep Learning
  Applications
Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications
Peifeng Yu
Mosharaf Chowdhury
57
73
0
12 Feb 2019
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite
Jiajia Li
Yuchen Ma
Xiaolong Wu
Ang Li
Kevin J. Barker
42
18
0
08 Feb 2019
SiamVGG: Visual Tracking using Deeper Siamese Networks
SiamVGG: Visual Tracking using Deeper Siamese Networks
Yuhong Li
Xiaofan Zhang
Deming Chen
ViT
107
47
0
07 Feb 2019
Exploration of Performance and Energy Trade-offs for Heterogeneous
  Multicore Architectures
Exploration of Performance and Energy Trade-offs for Heterogeneous Multicore Architectures
Anastasiia Butko
Florent Bruguier
D. Novo
A. Gamatie
G. Sassatelli
32
10
0
06 Feb 2019
Neural-Network Guided Expression Transformation
Neural-Network Guided Expression Transformation
Romain Edelmann
Viktor Kunčak
43
1
0
06 Feb 2019
Same, Same But Different - Recovering Neural Network Quantization Error
  Through Weight Factorization
Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization
Eldad Meller
Alexander Finkelstein
Uri Almog
Mark Grobman
MQ
74
87
0
05 Feb 2019
ROMANet: Fine-Grained Reuse-Driven Off-Chip Memory Access Management and
  Data Organization for Deep Neural Network Accelerators
ROMANet: Fine-Grained Reuse-Driven Off-Chip Memory Access Management and Data Organization for Deep Neural Network Accelerators
Rachmad Vidya Wicaksana Putra
Muhammad Abdullah Hanif
Mohamed Bennai
68
22
0
04 Feb 2019
CapStore: Energy-Efficient Design and Management of the On-Chip Memory
  for CapsuleNet Inference Accelerators
CapStore: Energy-Efficient Design and Management of the On-Chip Memory for CapsuleNet Inference Accelerators
Alberto Marchisio
Muhammad Abdullah Hanif
Mohammad Taghi Teimoori
Mohamed Bennai
52
7
0
04 Feb 2019
TF-Replicator: Distributed Machine Learning for Researchers
TF-Replicator: Distributed Machine Learning for Researchers
P. Buchlovsky
David Budden
Dominik Grewe
Chris Jones
John Aslanides
...
Aidan Clark
Sergio Gomez Colmenarejo
Aedan Pope
Fabio Viola
Dan Belov
GNNOffRLAI4CE
81
20
0
01 Feb 2019
Memory-Efficient Adaptive Optimization
Memory-Efficient Adaptive Optimization
Rohan Anil
Vineet Gupta
Tomer Koren
Y. Singer
ODL
90
49
0
30 Jan 2019
The OoO VLIW JIT Compiler for GPU Inference
The OoO VLIW JIT Compiler for GPU Inference
Paras Jain
Xiangxi Mo
Ajay Jain
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
104
17
0
28 Jan 2019
Improving Neural Network Quantization without Retraining using Outlier
  Channel Splitting
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
OODDMQ
146
312
0
28 Jan 2019
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN
  Accelerator Architecture
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
Yu Ji
Youyang Zhang
Xinfeng Xie
Shuangchen Li
Peiqi Wang
Xing Hu
Youhui Zhang
Yuan Xie
47
55
0
28 Jan 2019
Intrinsically Sparse Long Short-Term Memory Networks
Intrinsically Sparse Long Short-Term Memory Networks
Shiwei Liu
Decebal Constantin Mocanu
Mykola Pechenizkiy
55
9
0
26 Jan 2019
Sparse evolutionary Deep Learning with over one million artificial
  neurons on commodity hardware
Sparse evolutionary Deep Learning with over one million artificial neurons on commodity hardware
Shiwei Liu
Decebal Constantin Mocanu
A. R. Ramapuram Matavalam
Yulong Pei
Mykola Pechenizkiy
BDL
90
93
0
26 Jan 2019
Revisiting Self-Supervised Visual Representation Learning
Revisiting Self-Supervised Visual Representation Learning
Alexander Kolesnikov
Xiaohua Zhai
Lucas Beyer
SSL
205
717
0
25 Jan 2019
Pricing options and computing implied volatilities using neural networks
Pricing options and computing implied volatilities using neural networks
Shuaiqiang Liu
C. Oosterlee
S. Bohté
82
123
0
25 Jan 2019
Large-Batch Training for LSTM and Beyond
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
63
91
0
24 Jan 2019
Partition Pruning: Parallelization-Aware Pruning for Deep Neural
  Networks
Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks
Sina Shahhosseini
Ahmad Albaqsami
Masoomeh Jasemi
N. Bagherzadeh
22
8
0
21 Jan 2019
Deep Neural Network Approximation for Custom Hardware: Where We've Been,
  Where We're Going
Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going
Erwei Wang
James J. Davis
Ruizhe Zhao
Ho-Cheung Ng
Xinyu Niu
Wayne Luk
P. Cheung
George A. Constantinides
88
59
0
21 Jan 2019
No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy
No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy
Amit Samanta
Suhas Shrinivasan
Antoine Kaufmann
Jonathan Mace
AI4CE
23
7
0
21 Jan 2019
Heterogeneous FPGA+GPU Embedded Systems: Challenges and Opportunities
Heterogeneous FPGA+GPU Embedded Systems: Challenges and Opportunities
Mohammad Hosseinabady
M. A. Zainol
J. Núñez-Yáñez
19
10
0
18 Jan 2019
NNStreamer: Stream Processing Paradigm for Neural Networks, Toward
  Efficient Development and Execution of On-Device AI Applications
NNStreamer: Stream Processing Paradigm for Neural Networks, Toward Efficient Development and Execution of On-Device AI Applications
MyungJoo Ham
Jijoong Moon
Geunsik Lim
Wook Song
Jaeyun Jung
...
Sangjung Woo
Youngchul Cho
Jinhyuck Park
Sewon Oh
Hong-Seok Kim
19
6
0
12 Jan 2019
Low Precision Constant Parameter CNN on FPGA
Low Precision Constant Parameter CNN on FPGA
Thiam Khean Hah
Yeong Tat Liew
Jason Ong
23
2
0
11 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU
  Servers
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Kai Zou
Paolo Costa
Peter R. Pietzuch
65
70
0
08 Jan 2019
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
Linghao Song
Jiachen Mao
Youwei Zhuo
Xuehai Qian
Hai Helen Li
Yiran Chen
90
98
0
07 Jan 2019
DSConv: Efficient Convolution Operator
DSConv: Efficient Convolution Operator
Marcelo Gennari
Roger Fawcett
V. Prisacariu
MQ
48
68
0
07 Jan 2019
Dynamic Space-Time Scheduling for GPU Inference
Dynamic Space-Time Scheduling for GPU Inference
Paras Jain
Xiangxi Mo
Ajay Jain
Harikaran Subbaraj
Rehana Durrani
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
80
66
0
31 Dec 2018
Batch Size Influence on Performance of Graphic and Tensor Processing
  Units during Training and Inference Phases
Batch Size Influence on Performance of Graphic and Tensor Processing Units during Training and Inference Phases
Yuriy Kochura
Yuri G. Gordienko
Vlad Taran
N. Gordienko
Alexandr Rokovyi
Oleg Alienin
S. Stirenko
AI4CE
39
31
0
31 Dec 2018
ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration
  of Learning
ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning
Hajar Falahati
Pejman Lotfi-Kamran
Mohammad Sadrosadati
H. Sarbazi-Azad
33
8
0
30 Dec 2018
Distill-Net: Application-Specific Distillation of Deep Convolutional
  Neural Networks for Resource-Constrained IoT Platforms
Distill-Net: Application-Specific Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms
Mohammad Motamedi
Felix Portillo
Daniel D. Fong
S. Ghiasi
35
3
0
16 Dec 2018
Bayesian Layers: A Module for Neural Network Uncertainty
Bayesian Layers: A Module for Neural Network Uncertainty
Dustin Tran
Michael W. Dusenberry
Mark van der Wilk
Danijar Hafner
UQCVBDL
131
124
0
10 Dec 2018
Wireless Network Intelligence at the Edge
Wireless Network Intelligence at the Edge
Jihong Park
S. Samarakoon
M. Bennis
Mérouane Debbah
113
521
0
07 Dec 2018
InferLine: ML Prediction Pipeline Provisioning and Management for Tight
  Latency Objectives
InferLine: ML Prediction Pipeline Provisioning and Management for Tight Latency Objectives
D. Crankshaw
Gur-Eyal Sela
Corey Zumar
Xiangxi Mo
Joseph E. Gonzalez
Ion Stoica
Alexey Tumanov
62
38
0
05 Dec 2018
Deep Positron: A Deep Neural Network Using the Posit Number System
Deep Positron: A Deep Neural Network Using the Posit Number System
Zachariah Carmichael
Seyed Hamed Fatemi Langroudi
Char Khazanov
Jeffrey Lillie
J. Gustafson
Dhireesha Kudithipudi
MQ
78
96
0
05 Dec 2018
Generating High Fidelity Images with Subscale Pixel Networks and
  Multidimensional Upscaling
Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling
Jacob Menick
Nal Kalchbrenner
106
151
0
04 Dec 2018
Making BREAD: Biomimetic strategies for Artificial Intelligence Now and
  in the Future
Making BREAD: Biomimetic strategies for Artificial Intelligence Now and in the Future
J. Krichmar
William M. Severa
Salar M. Khan
J. Olds
AI4CE
39
21
0
04 Dec 2018
Pre-Defined Sparse Neural Networks with Hardware Acceleration
Pre-Defined Sparse Neural Networks with Hardware Acceleration
Sourya Dey
Kuan-Wen Huang
Peter A. Beerel
K. Chugg
109
25
0
04 Dec 2018
Predicting the Computational Cost of Deep Learning Models
Predicting the Computational Cost of Deep Learning Models
Daniel Justus
John Brennan
Stephen Bonner
A. Mcgough
46
231
0
28 Nov 2018
Efficient non-uniform quantizer for quantized neural network targeting
  reconfigurable hardware
Efficient non-uniform quantizer for quantized neural network targeting reconfigurable hardware
Natan Liss
Chaim Baskin
A. Mendelson
A. Bronstein
Raja Giryes
MQ
41
5
0
27 Nov 2018
Deep Learning Inference in Facebook Data Centers: Characterization,
  Performance Optimizations and Hardware Implications
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Jongsoo Park
Maxim Naumov
Protonu Basu
Summer Deng
Aravind Kalaiah
...
Lin Qiao
Vijay Rao
Nadav Rotem
S. Yoo
M. Smelyanskiy
FedMLGNNBDL
93
187
0
24 Nov 2018
Building Efficient Deep Neural Networks with Unitary Group Convolutions
Building Efficient Deep Neural Networks with Unitary Group Convolutions
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
77
28
0
19 Nov 2018
A Survey on Spark Ecosystem for Big Data Processing
A Survey on Spark Ecosystem for Big Data Processing
Shanjian Tang
Bingsheng He
Ce Yu
Yusen Li
Kun Li
34
11
0
18 Nov 2018
Previous
123...192021222324
Next