ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04760
  4. Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
ArXivPDFHTML

Papers citing "In-Datacenter Performance Analysis of a Tensor Processing Unit"

50 / 1,165 papers shown
Title
5 Parallel Prism: A topology for pipelined implementations of
  convolutional neural networks using computational memory
5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory
M. Dazzi
Abu Sebastian
P. Francese
Thomas Parnell
Luca Benini
E. Eleftheriou
GNN
18
8
0
08 Jun 2019
Scaling Autoregressive Video Models
Scaling Autoregressive Video Models
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffM
VGen
38
200
0
06 Jun 2019
(Pen-) Ultimate DNN Pruning
(Pen-) Ultimate DNN Pruning
Marc Riera
J. Arnau
Antonio González
CVBM
11
1
0
06 Jun 2019
The Architectural Implications of Facebook's DNN-based Personalized
  Recommendation
The Architectural Implications of Facebook's DNN-based Personalized Recommendation
Udit Gupta
Carole-Jean Wu
Xiaodong Wang
Maxim Naumov
Brandon Reagen
...
Andrey Malevich
Dheevatsa Mudigere
M. Smelyanskiy
Liang Xiong
Xuan Zhang
GNN
44
290
0
06 Jun 2019
OpenEI: An Open Framework for Edge Intelligence
OpenEI: An Open Framework for Edge Intelligence
Xingzhou Zhang
Yifan Wang
Sidi Lu
Liangkai Liu
Lanyu Xu
Weisong Shi
29
101
0
05 Jun 2019
SO(8) Supergravity and the Magic of Machine Learning
SO(8) Supergravity and the Magic of Machine Learning
Iulia Comsa
Moritz Firsching
T. Fischbacher
16
40
0
01 Jun 2019
INFaaS: A Model-less and Managed Inference Serving System
INFaaS: A Model-less and Managed Inference Serving System
Francisco Romero
Qian Li
N. Yadwadkar
Christos Kozyrakis
34
14
0
30 May 2019
Learning In Practice: Reasoning About Quantization
Learning In Practice: Reasoning About Quantization
Annie Cherkaev
W. Tai
J. M. Phillips
Vivek Srikumar
MQ
6
1
0
27 May 2019
An Open-Source Benchmark Suite for Cloud and IoT Microservices
An Open-Source Benchmark Suite for Cloud and IoT Microservices
Yu Gan
Yanqi Zhang
Dailun Cheng
A. Shetty
Priyal Rathi
...
Mateo Espinosa Zarlenga
Rick Lin
Zhongling Liu
Jake Padilla
Christina Delimitrou
20
3
0
27 May 2019
Feature Map Transform Coding for Energy-Efficient CNN Inference
Feature Map Transform Coding for Energy-Efficient CNN Inference
Brian Chmiel
Chaim Baskin
Ron Banner
Evgenii Zheltonozhskii
Yevgeny Yermolin
Alex Karbachevsky
A. Bronstein
A. Mendelson
33
24
0
26 May 2019
Scaling Video Analytics on Constrained Edge Nodes
Scaling Video Analytics on Constrained Edge Nodes
Christopher Canel
Thomas Kim
Giulio Zhou
Conglong Li
Hyeontaek Lim
D. Andersen
M. Kaminsky
Subramanya R. Dulloor
11
162
0
24 May 2019
Polystore++: Accelerated Polystore System for Heterogeneous Workloads
Polystore++: Accelerated Polystore System for Heterogeneous Workloads
Rekha Singhal
Nathan Zhang
Luigi Nardi
M. Shahbaz
K. Olukotun
19
8
0
24 May 2019
Learning Low-Rank Approximation for CNNs
Learning Low-Rank Approximation for CNNs
Dongsoo Lee
S. Kwon
Byeongwook Kim
Gu-Yeon Wei
32
19
0
24 May 2019
Sentence Length
Sentence Length
Gábor Borbély
András Kornai
23
11
0
22 May 2019
A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT
  Devices
A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices
Xiaofan Zhang
Cong Hao
Yuhong Li
Yao Chen
Jinjun Xiong
Wen-mei W. Hwu
Deming Chen
26
13
0
20 May 2019
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for
  high-speed visual classification
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification
A. Linares-Barranco
A. Rios-Navarro
Ricardo Tapiador-Morales
T. Delbruck
33
19
0
17 May 2019
Gmail Smart Compose: Real-Time Assisted Writing
Gmail Smart Compose: Real-Time Assisted Writing
Mengzhao Chen
Benjamin Lee
G. Bansal
Yuan Cao
Shuyuan Zhang
...
Yinan Wang
Andrew M. Dai
Zhehuai Chen
Timothy Sohn
Yonghui Wu
24
203
0
17 May 2019
Optimizing Routerless Network-on-Chip Designs: An Innovative
  Learning-Based Framework
Optimizing Routerless Network-on-Chip Designs: An Innovative Learning-Based Framework
Ting-Ru Lin
Drew Penney
Massoud Pedram
Lizhong Chen
3DV
11
8
0
11 May 2019
Single-Path NAS: Device-Aware Efficient ConvNet Design
Single-Path NAS: Device-Aware Efficient ConvNet Design
Dimitrios Stamoulis
Ruizhou Ding
Di Wang
Dimitrios Lymberopoulos
B. Priyantha
Jie Liu
Diana Marculescu
19
18
0
10 May 2019
NeuPart: Using Analytical Models to Drive Energy-Efficient Partitioning
  of CNN Computations on Cloud-Connected Mobile Clients
NeuPart: Using Analytical Models to Drive Energy-Efficient Partitioning of CNN Computations on Cloud-Connected Mobile Clients
Susmita Dey Manasi
F. S. Snigdha
S. Sapatnekar
34
16
0
09 May 2019
AI Enabling Technologies: A Survey
AI Enabling Technologies: A Survey
V. Gadepally
Justin A. Goodwin
J. Kepner
Albert Reuther
Hayley Reynolds
S. Samsi
Jonathan Su
David Martinez
27
24
0
08 May 2019
Searching for MobileNetV3
Searching for MobileNetV3
Andrew G. Howard
Mark Sandler
Grace Chu
Liang-Chieh Chen
Bo Chen
...
Yukun Zhu
Ruoming Pang
Vijay Vasudevan
Quoc V. Le
Hartwig Adam
106
6,623
0
06 May 2019
Leveraging Deep Learning to Improve the Performance Predictability of
  Cloud Microservices
Leveraging Deep Learning to Improve the Performance Predictability of Cloud Microservices
Yu Gan
Yanqi Zhang
Kelvin Hu
Dailun Cheng
Yuan He
Meghna Pancholi
Christina Delimitrou
17
2
0
02 May 2019
Parity Models: A General Framework for Coding-Based Resilience in ML
  Inference
Parity Models: A General Framework for Coding-Based Resilience in ML Inference
J. Kosaian
K. V. Rashmi
Shivaram Venkataraman
8
14
0
02 May 2019
Full-stack Optimization for Accelerating CNNs with FPGA Validation
Full-stack Optimization for Accelerating CNNs with FPGA Validation
Bradley McDanel
Shanghang Zhang
H. T. Kung
Xin Dong
MQ
25
2
0
01 May 2019
Deep Learning for Audio Signal Processing
Deep Learning for Audio Signal Processing
Hendrik Purwins
Bo Li
Tuomas Virtanen
Jan Schlüter
Shuo-yiin Chang
Tara N. Sainath
VLM
26
587
0
30 Apr 2019
SWALP : Stochastic Weight Averaging in Low-Precision Training
SWALP : Stochastic Weight Averaging in Low-Precision Training
Guandao Yang
Tianyi Zhang
Polina Kirichenko
Junwen Bai
A. Wilson
Christopher De Sa
24
94
0
26 Apr 2019
Relay: A High-Level Compiler for Deep Learning
Relay: A High-Level Compiler for Deep Learning
Jared Roesch
Steven Lyubomirsky
Marisa Kirisame
Logan Weber
Josh Pollock
Luis Vega
Ziheng Jiang
Tianqi Chen
T. Moreau
Zachary Tatlock
31
21
0
17 Apr 2019
swTVM: Towards Optimized Tensor Code Generation for Deep Learning on
  Sunway Many-Core Processor
swTVM: Towards Optimized Tensor Code Generation for Deep Learning on Sunway Many-Core Processor
Mingzhen Li
Changxi Liu
Jian-He Liao
Xuegui Zheng
Hailong Yang
...
Jun Xu
L. Gan
Guangwen Yang
Zhongzhi Luan
D. Qian
24
2
0
16 Apr 2019
Detecting Anemia from Retinal Fundus Images
Detecting Anemia from Retinal Fundus Images
A. Mitani
Yun-Hui Liu
Abigail E. Huang
G. Corrado
L. Peng
D. Webster
N. Hammel
A. Varadarajan
19
32
0
12 Apr 2019
Cramnet: Layer-wise Deep Neural Network Compression with Knowledge
  Transfer from a Teacher Network
Cramnet: Layer-wise Deep Neural Network Compression with Knowledge Transfer from a Teacher Network
J. Hoffman
20
3
0
11 Apr 2019
CondConv: Conditionally Parameterized Convolutions for Efficient
  Inference
CondConv: Conditionally Parameterized Convolutions for Efficient Inference
Brandon Yang
Gabriel Bender
Quoc V. Le
Jiquan Ngiam
MedIm
3DV
31
622
0
10 Apr 2019
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
Weicheng Kuo
A. Angelova
Jitendra Malik
Nayeon Lee
3DPC
ISeg
32
117
0
05 Apr 2019
Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4
  Hours
Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours
Dimitrios Stamoulis
Ruizhou Ding
Di Wang
Dimitrios Lymberopoulos
B. Priyantha
Jie Liu
Diana Marculescu
6
283
0
05 Apr 2019
LUTNet: Rethinking Inference in FPGA Soft Logic
LUTNet: Rethinking Inference in FPGA Soft Logic
Erwei Wang
James J. Davis
P. Cheung
George A. Constantinides
24
61
0
01 Apr 2019
High Performance Monte Carlo Simulation of Ising Model on TPU Clusters
High Performance Monte Carlo Simulation of Ising Model on TPU Clusters
Kun Yang
Yi-Fan Chen
George Roumpos
Christopher Colby
John R. Anderson
16
49
0
27 Mar 2019
Scalable Deep Learning on Distributed Infrastructures: Challenges,
  Techniques and Tools
Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools
R. Mayer
Hans-Arno Jacobsen
GNN
27
186
0
27 Mar 2019
Performance-Efficiency Trade-off of Low-Precision Numerical Formats in
  Deep Neural Networks
Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks
Zachariah Carmichael
H. F. Langroudi
Char Khazanov
Jeffrey Lillie
J. Gustafson
Dhireesha Kudithipudi
29
55
0
25 Mar 2019
Topology-based Representative Datasets to Reduce Neural Network Training
  Resources
Topology-based Representative Datasets to Reduce Neural Network Training Resources
Rocio Gonzalez-Diaz
Miguel A. Gutiérrez-Naranjo
Eduardo Paluzo-Hidalgo
DD
17
9
0
20 Mar 2019
Software-defined Design Space Exploration for an Efficient DNN
  Accelerator Architecture
Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture
Y. Yu
Yingmin Li
Shuai Che
N. Jha
Weifeng Zhang
22
22
0
18 Mar 2019
A Brain-inspired Algorithm for Training Highly Sparse Neural Networks
A Brain-inspired Algorithm for Training Highly Sparse Neural Networks
Zahra Atashgahi
Joost Pieterse
Shiwei Liu
Decebal Constantin Mocanu
Raymond N. J. Veldhuis
Mykola Pechenizkiy
35
15
0
17 Mar 2019
Stripe: Tensor Compilation via the Nested Polyhedral Model
Stripe: Tensor Compilation via the Nested Polyhedral Model
Tim Zerrell
J. Bruestle
15
32
0
14 Mar 2019
TensorFlow Doing HPC
TensorFlow Doing HPC
Steven W. D. Chien
Stefano Markidis
V. Olshevsky
Yaroslav Bulatov
Erwin Laure
Jeffrey S. Vetter
13
15
0
11 Mar 2019
Automated Circuit Approximation Method Driven by Data Distribution
Automated Circuit Approximation Method Driven by Data Distribution
Z. Vašíček
Vojtěch Mrázek
Lukás Sekanina
15
17
0
11 Mar 2019
Analyzing GPU Tensor Core Potential for Fast Reductions
Analyzing GPU Tensor Core Potential for Fast Reductions
R. Carrasco
R. Vega
C. Navarro
6
11
0
08 Mar 2019
Accelerating Generalized Linear Models with MLWeaving: A
  One-Size-Fits-All System for Any-precision Learning (Technical Report)
Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report)
Zeke Wang
Kaan Kara
Hantian Zhang
Gustavo Alonso
O. Mutlu
Ce Zhang
31
34
0
08 Mar 2019
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for
  Large-Scale Deep Learning Systems
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems
Beidi Chen
Tharun Medini
James Farwell
Sameh Gobriel
Charlie Tai
Anshumali Shrivastava
30
103
0
07 Mar 2019
Streaming Batch Eigenupdates for Hardware Neuromorphic Networks
Streaming Batch Eigenupdates for Hardware Neuromorphic Networks
Brian D. Hoskins
M. Daniels
Siyuan Huang
A. Madhavan
G. Adam
N. Zhitenev
Jabez J. McClelland
M. D. Stiles
11
14
0
05 Mar 2019
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer
  Learning
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning
P. Whatmough
Chuteng Zhou
Patrick Hansen
S. Venkataramanaiah
Jae-sun Seo
Matthew Mattina
15
57
0
27 Feb 2019
A Survey on Graph Processing Accelerators: Challenges and Opportunities
A Survey on Graph Processing Accelerators: Challenges and Opportunities
Chuangyi Gui
Long Zheng
Bingsheng He
Cheng Liu
Xinyu Chen
Xiaofei Liao
Hai Jin
GNN
7
69
0
26 Feb 2019
Previous
123...181920...222324
Next