ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04760
  4. Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
ArXivPDFHTML

Papers citing "In-Datacenter Performance Analysis of a Tensor Processing Unit"

50 / 1,165 papers shown
Title
Learned Step Size Quantization
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
31
782
0
21 Feb 2019
DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on
  FPGA-based CNN Accelerators
DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators
Yu Xing
Shuang Liang
Lingzhi Sui
Xijie Jia
Jiantao Qiu
Xin Liu
Yushun Wang
Yu Wang
Yi Shan
46
68
0
20 Feb 2019
Low-bit Quantization of Neural Networks for Efficient Inference
Low-bit Quantization of Neural Networks for Efficient Inference
Yoni Choukroun
Eli Kravchik
Fan Yang
P. Kisilev
MQ
33
356
0
18 Feb 2019
Graph-RISE: Graph-Regularized Image Semantic Embedding
Graph-RISE: Graph-Regularized Image Semantic Embedding
Da-Cheng Juan
Chun-Ta Lu
Zerui Li
Futang Peng
Aleksei Timofeev
Yi-Ting Chen
Yaxi Gao
Tom Duerig
Andrew Tomkins
Sujith Ravi
26
40
0
14 Feb 2019
Salus: Fine-Grained GPU Sharing Primitives for Deep Learning
  Applications
Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications
Peifeng Yu
Mosharaf Chowdhury
18
72
0
12 Feb 2019
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite
Jiajia Li
Yuchen Ma
Xiaolong Wu
Ang Li
Kevin J. Barker
25
18
0
08 Feb 2019
SiamVGG: Visual Tracking using Deeper Siamese Networks
SiamVGG: Visual Tracking using Deeper Siamese Networks
Yuhong Li
Xiaofan Zhang
Deming Chen
ViT
52
47
0
07 Feb 2019
Exploration of Performance and Energy Trade-offs for Heterogeneous
  Multicore Architectures
Exploration of Performance and Energy Trade-offs for Heterogeneous Multicore Architectures
Anastasiia Butko
Florent Bruguier
D. Novo
A. Gamatie
G. Sassatelli
19
10
0
06 Feb 2019
Neural-Network Guided Expression Transformation
Neural-Network Guided Expression Transformation
Romain Edelmann
Viktor Kunčak
16
1
0
06 Feb 2019
Same, Same But Different - Recovering Neural Network Quantization Error
  Through Weight Factorization
Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization
Eldad Meller
Alexander Finkelstein
Uri Almog
Mark Grobman
MQ
24
85
0
05 Feb 2019
ROMANet: Fine-Grained Reuse-Driven Off-Chip Memory Access Management and
  Data Organization for Deep Neural Network Accelerators
ROMANet: Fine-Grained Reuse-Driven Off-Chip Memory Access Management and Data Organization for Deep Neural Network Accelerators
Rachmad Vidya Wicaksana Putra
Muhammad Abdullah Hanif
Mohamed Bennai
9
21
0
04 Feb 2019
CapStore: Energy-Efficient Design and Management of the On-Chip Memory
  for CapsuleNet Inference Accelerators
CapStore: Energy-Efficient Design and Management of the On-Chip Memory for CapsuleNet Inference Accelerators
Alberto Marchisio
Muhammad Abdullah Hanif
Mohammad Taghi Teimoori
Mohamed Bennai
8
7
0
04 Feb 2019
TF-Replicator: Distributed Machine Learning for Researchers
TF-Replicator: Distributed Machine Learning for Researchers
P. Buchlovsky
David Budden
Dominik Grewe
Chris Jones
John Aslanides
...
Aidan Clark
Sergio Gomez Colmenarejo
Aedan Pope
Fabio Viola
Dan Belov
GNN
OffRL
AI4CE
37
20
0
01 Feb 2019
Memory-Efficient Adaptive Optimization
Memory-Efficient Adaptive Optimization
Rohan Anil
Vineet Gupta
Tomer Koren
Y. Singer
ODL
21
49
0
30 Jan 2019
The OoO VLIW JIT Compiler for GPU Inference
The OoO VLIW JIT Compiler for GPU Inference
Paras Jain
Xiangxi Mo
Ajay Jain
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
39
17
0
28 Jan 2019
Improving Neural Network Quantization without Retraining using Outlier
  Channel Splitting
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
OODD
MQ
55
305
0
28 Jan 2019
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN
  Accelerator Architecture
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture
Yu Ji
Youyang Zhang
Xinfeng Xie
Shuangchen Li
Peiqi Wang
Xing Hu
Youhui Zhang
Yuan Xie
25
55
0
28 Jan 2019
Intrinsically Sparse Long Short-Term Memory Networks
Intrinsically Sparse Long Short-Term Memory Networks
Shiwei Liu
Decebal Constantin Mocanu
Mykola Pechenizkiy
30
9
0
26 Jan 2019
Sparse evolutionary Deep Learning with over one million artificial
  neurons on commodity hardware
Sparse evolutionary Deep Learning with over one million artificial neurons on commodity hardware
Shiwei Liu
Decebal Constantin Mocanu
A. R. Ramapuram Matavalam
Yulong Pei
Mykola Pechenizkiy
BDL
16
88
0
26 Jan 2019
Revisiting Self-Supervised Visual Representation Learning
Revisiting Self-Supervised Visual Representation Learning
Alexander Kolesnikov
Xiaohua Zhai
Lucas Beyer
SSL
77
715
0
25 Jan 2019
Pricing options and computing implied volatilities using neural networks
Pricing options and computing implied volatilities using neural networks
Shuaiqiang Liu
C. Oosterlee
S. Bohté
19
119
0
25 Jan 2019
Large-Batch Training for LSTM and Beyond
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
30
89
0
24 Jan 2019
Partition Pruning: Parallelization-Aware Pruning for Deep Neural
  Networks
Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks
Sina Shahhosseini
Ahmad Albaqsami
Masoomeh Jasemi
N. Bagherzadeh
12
8
0
21 Jan 2019
Deep Neural Network Approximation for Custom Hardware: Where We've Been,
  Where We're Going
Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going
Erwei Wang
James J. Davis
Ruizhe Zhao
Ho-Cheung Ng
Xinyu Niu
Wayne Luk
P. Cheung
George A. Constantinides
24
59
0
21 Jan 2019
No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy
No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy
Amit Samanta
Suhas Shrinivasan
Antoine Kaufmann
Jonathan Mace
AI4CE
6
7
0
21 Jan 2019
Heterogeneous FPGA+GPU Embedded Systems: Challenges and Opportunities
Heterogeneous FPGA+GPU Embedded Systems: Challenges and Opportunities
Mohammad Hosseinabady
M. A. Zainol
J. Núñez-Yáñez
9
10
0
18 Jan 2019
NNStreamer: Stream Processing Paradigm for Neural Networks, Toward
  Efficient Development and Execution of On-Device AI Applications
NNStreamer: Stream Processing Paradigm for Neural Networks, Toward Efficient Development and Execution of On-Device AI Applications
MyungJoo Ham
Jijoong Moon
Geunsik Lim
Wook Song
Jaeyun Jung
...
Sangjung Woo
Youngchul Cho
Jinhyuck Park
Sewon Oh
Hong-Seok Kim
17
6
0
12 Jan 2019
Low Precision Constant Parameter CNN on FPGA
Low Precision Constant Parameter CNN on FPGA
Thiam Khean Hah
Yeong Tat Liew
Jason Ong
9
2
0
11 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU
  Servers
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Luo Mai
Paolo Costa
Peter R. Pietzuch
32
69
0
08 Jan 2019
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
Linghao Song
Jiachen Mao
Youwei Zhuo
Xuehai Qian
Hai Helen Li
Yiran Chen
30
97
0
07 Jan 2019
DSConv: Efficient Convolution Operator
DSConv: Efficient Convolution Operator
Marcelo Gennari
Roger Fawcett
V. Prisacariu
MQ
32
62
0
07 Jan 2019
Dynamic Space-Time Scheduling for GPU Inference
Dynamic Space-Time Scheduling for GPU Inference
Paras Jain
Xiangxi Mo
Ajay Jain
Harikaran Subbaraj
Rehana Durrani
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
35
64
0
31 Dec 2018
Batch Size Influence on Performance of Graphic and Tensor Processing
  Units during Training and Inference Phases
Batch Size Influence on Performance of Graphic and Tensor Processing Units during Training and Inference Phases
Yuriy Kochura
Yuri G. Gordienko
Vlad Taran
N. Gordienko
Alexandr Rokovyi
Oleg Alienin
S. Stirenko
AI4CE
19
30
0
31 Dec 2018
ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration
  of Learning
ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning
Hajar Falahati
Pejman Lotfi-Kamran
Mohammad Sadrosadati
H. Sarbazi-Azad
9
8
0
30 Dec 2018
Distill-Net: Application-Specific Distillation of Deep Convolutional
  Neural Networks for Resource-Constrained IoT Platforms
Distill-Net: Application-Specific Distillation of Deep Convolutional Neural Networks for Resource-Constrained IoT Platforms
Mohammad Motamedi
Felix Portillo
Daniel D. Fong
S. Ghiasi
28
3
0
16 Dec 2018
Bayesian Layers: A Module for Neural Network Uncertainty
Bayesian Layers: A Module for Neural Network Uncertainty
Dustin Tran
Michael W. Dusenberry
Mark van der Wilk
Danijar Hafner
UQCV
BDL
27
121
0
10 Dec 2018
Wireless Network Intelligence at the Edge
Wireless Network Intelligence at the Edge
Jihong Park
S. Samarakoon
M. Bennis
Mérouane Debbah
23
518
0
07 Dec 2018
InferLine: ML Prediction Pipeline Provisioning and Management for Tight
  Latency Objectives
InferLine: ML Prediction Pipeline Provisioning and Management for Tight Latency Objectives
D. Crankshaw
Gur-Eyal Sela
Corey Zumar
Xiangxi Mo
Joseph E. Gonzalez
Ion Stoica
Alexey Tumanov
14
38
0
05 Dec 2018
Deep Positron: A Deep Neural Network Using the Posit Number System
Deep Positron: A Deep Neural Network Using the Posit Number System
Zachariah Carmichael
Seyed Hamed Fatemi Langroudi
Char Khazanov
Jeffrey Lillie
J. Gustafson
Dhireesha Kudithipudi
MQ
17
96
0
05 Dec 2018
Generating High Fidelity Images with Subscale Pixel Networks and
  Multidimensional Upscaling
Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling
Jacob Menick
Nal Kalchbrenner
28
150
0
04 Dec 2018
Making BREAD: Biomimetic strategies for Artificial Intelligence Now and
  in the Future
Making BREAD: Biomimetic strategies for Artificial Intelligence Now and in the Future
J. Krichmar
William M. Severa
Salar M. Khan
J. Olds
AI4CE
21
21
0
04 Dec 2018
Pre-Defined Sparse Neural Networks with Hardware Acceleration
Pre-Defined Sparse Neural Networks with Hardware Acceleration
Sourya Dey
Kuan-Wen Huang
P. Beerel
K. Chugg
41
24
0
04 Dec 2018
Predicting the Computational Cost of Deep Learning Models
Predicting the Computational Cost of Deep Learning Models
Daniel Justus
John Brennan
Stephen Bonner
A. Mcgough
9
228
0
28 Nov 2018
Efficient non-uniform quantizer for quantized neural network targeting
  reconfigurable hardware
Efficient non-uniform quantizer for quantized neural network targeting reconfigurable hardware
Natan Liss
Chaim Baskin
A. Mendelson
A. Bronstein
Raja Giryes
MQ
24
5
0
27 Nov 2018
Deep Learning Inference in Facebook Data Centers: Characterization,
  Performance Optimizations and Hardware Implications
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Jongsoo Park
Maxim Naumov
Protonu Basu
Summer Deng
Aravind Kalaiah
...
Lin Qiao
Vijay Rao
Nadav Rotem
S. Yoo
M. Smelyanskiy
FedML
GNN
BDL
20
187
0
24 Nov 2018
Building Efficient Deep Neural Networks with Unitary Group Convolutions
Building Efficient Deep Neural Networks with Unitary Group Convolutions
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
32
28
0
19 Nov 2018
A Survey on Spark Ecosystem for Big Data Processing
A Survey on Spark Ecosystem for Big Data Processing
Shanjian Tang
Bingsheng He
Ce Yu
Yusen Li
Kun Li
17
11
0
18 Nov 2018
Image Classification at Supercomputer Scale
Image Classification at Supercomputer Scale
Chris Ying
Sameer Kumar
Dehao Chen
Tao Wang
Youlong Cheng
VLM
21
122
0
16 Nov 2018
Streaming End-to-end Speech Recognition For Mobile Devices
Streaming End-to-end Speech Recognition For Mobile Devices
Yanzhang He
Tara N. Sainath
Rohit Prabhavalkar
Ian McGraw
R. Álvarez
...
K. Sim
Tom Bagby
Shuo-yiin Chang
Kanishka Rao
A. Gruenstein
54
624
0
15 Nov 2018
Performance Estimation of Synthesis Flows cross Technologies using LSTMs
  and Transfer Learning
Performance Estimation of Synthesis Flows cross Technologies using LSTMs and Transfer Learning
Cunxi Yu
Wang Zhou
AI4TS
9
0
0
14 Nov 2018
Previous
123...192021222324
Next