ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04760
  4. Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
ArXivPDFHTML

Papers citing "In-Datacenter Performance Analysis of a Tensor Processing Unit"

50 / 1,165 papers shown
Title
Mixed-Precision Random Projection for RandNLA on Tensor Cores
Mixed-Precision Random Projection for RandNLA on Tensor Cores
Hiroyuki Ootomo
Rio Yokota
19
3
0
10 Apr 2023
Arithmetic Intensity Balancing Convolution for Hardware-aware Efficient
  Block Design
Arithmetic Intensity Balancing Convolution for Hardware-aware Efficient Block Design
Shinkook Choi
Junkyeong Choi
24
1
0
08 Apr 2023
Tensor Slicing and Optimization for Multicore NPUs
Tensor Slicing and Optimization for Multicore NPUs
R. Sousa
M. Pereira
Yongin Kwon
Taeho Kim
Namsoon Jung
Chang Soo Kim
Michael Frank
Guido Araujo
30
5
0
06 Apr 2023
A differentiable programming framework for spin models
A differentiable programming framework for spin models
T. S. Farias
V. V. Schultz
José C. M. Mombach
Jonas Maziero
35
0
0
04 Apr 2023
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning
  with Hardware Support for Embeddings
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
N. Jouppi
George Kurian
Sheng Li
Peter C. Ma
R. Nagarajan
...
Brian Towles
C. Young
Xiaoping Zhou
Zongwei Zhou
David A. Patterson
BDL
VLM
55
341
0
04 Apr 2023
Practical Conformer: Optimizing size, speed and flops of Conformer for
  on-Device and cloud ASR
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR
Rami Botros
Anmol Gulati
Tara N. Sainath
K. Choromanski
Ruoming Pang
Trevor Strohman
Weiran Wang
Jiahui Yu
MQ
28
3
0
31 Mar 2023
D-STACK: High Throughput DNN Inference by Effective Multiplexing and
  Spatio-Temporal Scheduling of GPUs
D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs
Aditya Dhakal
Sameer G. Kulkarni
K. Ramakrishnan
22
3
0
31 Mar 2023
PowerPruning: Selecting Weights and Activations for Power-Efficient
  Neural Network Acceleration
PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration
Richard Petri
Grace Li Zhang
Yiran Chen
Ulf Schlichtmann
Bing Li
24
6
0
24 Mar 2023
Pre-NeRF 360: Enriching Unbounded Appearances for Neural Radiance Fields
Pre-NeRF 360: Enriching Unbounded Appearances for Neural Radiance Fields
Ahmad AlMughrabi
Umair Haroon
Ricardo Marques
Petia Radeva
35
5
0
21 Mar 2023
Economical Quaternion Extraction from a Human Skeletal Pose Estimate
  using 2-D Cameras
Economical Quaternion Extraction from a Human Skeletal Pose Estimate using 2-D Cameras
S. Radhakrishna
A. Balasubramanyam
3DH
37
1
0
15 Mar 2023
DeepAxe: A Framework for Exploration of Approximation and Reliability
  Trade-offs in DNN Accelerators
DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators
Mahdi Taheri
M. Riazati
Mohammad Hasan Ahmadilivani
M. Jenihhin
Masoud Daneshtalab
J. Raik
Mikael Sjödin
B. Lisper
57
20
0
14 Mar 2023
X-Former: In-Memory Acceleration of Transformers
X-Former: In-Memory Acceleration of Transformers
S. Sridharan
Jacob R. Stevens
Kaushik Roy
A. Raghunathan
GNN
26
36
0
13 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only
  Training
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
50
86
0
06 Mar 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
26
153
0
03 Mar 2023
HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node
  Heterogeneous Architecture
HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture
Yi-Chien Lin
Viktor Prasanna
GNN
37
7
0
01 Mar 2023
Auxiliary MCMC and particle Gibbs samplers for parallelisable inference in latent dynamical systems
Auxiliary MCMC and particle Gibbs samplers for parallelisable inference in latent dynamical systems
Adrien Corenflos
Simo Särkkä
18
0
0
01 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
102
0
27 Feb 2023
MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation
MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation
Samuel Hsia
Udit Gupta
Bilge Acun
Newsha Ardalani
Pan Zhong
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
49
17
0
21 Feb 2023
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile
  Acceleration on CPUs
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Geonhwa Jeong
S. Damani
Abhimanyu Bambhaniya
Eric Qin
C. Hughes
S. Subramoney
Hyesoon Kim
T. Krishna
MoE
46
24
0
17 Feb 2023
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic
  Compression
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
Minghao Li
Ran Ben-Basat
S. Vargaftik
Chon-In Lao
Ke Xu
Michael Mitzenmacher
Minlan Yu Harvard University
26
15
0
16 Feb 2023
Toward matrix multiplication for deep learning inference on the Xilinx
  Versal
Toward matrix multiplication for deep learning inference on the Xilinx Versal
Jie Lei
J. Flich
Enrique S. Quintana-Ortí
16
4
0
15 Feb 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient
  approaches along the Deep Learning Lifecycle
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle
Vanessa Mehlin
Sigurd Schacht
Carsten Lanquillon
HAI
MedIm
33
19
0
05 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
Bit-balance: Model-Hardware Co-design for Accelerating NNs by Exploiting
  Bit-level Sparsity
Bit-balance: Model-Hardware Co-design for Accelerating NNs by Exploiting Bit-level Sparsity
Wenhao Sun
Zhiwei Zou
Deng Liu
Wendi Sun
Song Chen
Yi Kang
MQ
17
4
0
01 Feb 2023
A Green(er) World for A.I
A Green(er) World for A.I
Dan Zhao
Nathan C. Frey
Joseph McDonald
Matthew Hubbell
David Bestor
Michael Jones
Andrew Prout
V. Gadepally
S. Samsi
32
6
0
27 Jan 2023
PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep
  Learning Models on Edge Devices
PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices
Yuji Chai
Devashree Tripathy
Chu Zhou
Dibakar Gope
Igor Fedorov
Ramon Matas
David Brooks
Gu-Yeon Wei
P. Whatmough
GNN
37
4
0
26 Jan 2023
SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional
  Network Accelerators
SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network Accelerators
Mingi Yoo
Jaeyong Song
Jounghoo Lee
Namhyung Kim
Youngsok Kim
Jinho Lee
GNN
43
18
0
25 Jan 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
31
25
0
24 Jan 2023
Slice-and-Forge: Making Better Use of Caches for Graph Convolutional
  Network Accelerators
Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators
Min-hee Yoo
Jaeyong Song
Hyeyoon Lee
Jounghoo Lee
Namhyung Kim
Youngsok Kim
Jinho Lee
GNN
48
5
0
24 Jan 2023
Enabling Hard Constraints in Differentiable Neural Network and
  Accelerator Co-Exploration
Enabling Hard Constraints in Differentiable Neural Network and Accelerator Co-Exploration
Deokki Hong
Kanghyun Choi
Hyeyoon Lee
Joonsang Yu
Noseong Park
Youngsok Kim
Jinho Lee
19
3
0
23 Jan 2023
Analog, In-memory Compute Architectures for Artificial Intelligence
Analog, In-memory Compute Architectures for Artificial Intelligence
Patrick Bowen
G. Regev
Nir Regev
Bruno U. Pedroni
Edward Hanson
Yiran Chen
17
3
0
13 Jan 2023
Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged
  DWDM Photonics
Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics
George Michelogiannakis
Yehia Arafa
B. Cook
Liang Yuan Dai
Abdel-Hameed A. Badawy
Madeleine Glick
Yuyang Wang
Keren Bergman
J. Shalf
14
8
0
09 Jan 2023
FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation
  Models
FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation Models
Geet Sethi
Pallab Bhattacharya
Dhruv Choudhary
Carole-Jean Wu
Christos Kozyrakis
24
5
0
08 Jan 2023
A Theory of I/O-Efficient Sparse Neural Network Inference
A Theory of I/O-Efficient Sparse Neural Network Inference
Niels Gleinig
Tal Ben-Nun
Torsten Hoefler
30
0
0
03 Jan 2023
Accelerating CNN inference on long vector architectures via co-design
Accelerating CNN inference on long vector architectures via co-design
Sonia Rani Gupta
Nikela Papadopoulou
Miquel Pericàs
3DV
13
4
0
22 Dec 2022
Annotated History of Modern AI and Deep Learning
Annotated History of Modern AI and Deep Learning
Juergen Schmidhuber
MLAU
AI4TS
AI4CE
33
22
0
21 Dec 2022
Sophisticated deep learning with on-chip optical diffractive tensor
  processing
Sophisticated deep learning with on-chip optical diffractive tensor processing
Yuyao Huang
Tingzhao Fu
Honghao Huang
Sigang Yang
Hong-wei Chen
BDL
13
13
0
20 Dec 2022
AnyTOD: A Programmable Task-Oriented Dialog System
AnyTOD: A Programmable Task-Oriented Dialog System
Jeffrey Zhao
Yuan Cao
Raghav Gupta
Harrison Lee
Abhinav Rastogi
Mingqiu Wang
H. Soltau
Izhak Shafran
Yonghui Wu
VLM
36
10
0
20 Dec 2022
Containerisation for High Performance Computing Systems: Survey and
  Prospects
Containerisation for High Performance Computing Systems: Survey and Prospects
Naweiluo Zhou
Huan Zhou
Dennis Hoppe
38
25
0
16 Dec 2022
Mystique: Enabling Accurate and Scalable Generation of Production AI
  Benchmarks
Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks
Mingyu Liang
Wenyin Fu
Louis Feng
Zhongyi Lin
P. Panakanti
Shengbao Zheng
Srinivas Sridharan
Christina Delimitrou
26
12
0
16 Dec 2022
Analytical Engines With Context-Rich Processing: Towards Efficient
  Next-Generation Analytics
Analytical Engines With Context-Rich Processing: Towards Efficient Next-Generation Analytics
Viktor Sanca
Anastasia Ailamaki
17
4
0
14 Dec 2022
DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space
  for DNN Accelerators through Analytical Modeling
DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling
L. Mei
Koen Goetschalckx
Arne Symons
Marian Verhelst
39
29
0
10 Dec 2022
Integration of a systolic array based hardware accelerator into a DNN
  operator auto-tuning framework
Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework
Federico Nicolás Peccia
Oliver Bringmann
16
5
0
06 Dec 2022
DisaggRec: Architecting Disaggregated Systems for Large-Scale
  Personalized Recommendation
DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation
Liu Ke
Xuan Zhang
Benjamin C. Lee
G. E. Suh
Hsien-Hsin S. Lee
43
8
0
02 Dec 2022
On-device Training: A First Overview on Existing Systems
On-device Training: A First Overview on Existing Systems
Shuai Zhu
Thiemo Voigt
Jeonggil Ko
Fatemeh Rahimian
34
14
0
01 Dec 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
30
103
0
29 Nov 2022
Edge Video Analytics: A Survey on Applications, Systems and Enabling
  Techniques
Edge Video Analytics: A Survey on Applications, Systems and Enabling Techniques
Renjie Xu
S. Razavi
Rong Zheng
44
15
0
28 Nov 2022
Extreme Acceleration of Graph Neural Network-based Prediction Models for
  Quantum Chemistry
Extreme Acceleration of Graph Neural Network-based Prediction Models for Quantum Chemistry
Hatem Helal
J. Firoz
Jenna A. Bilbrey
M. M. Krell
Tom Murray
Ang Li
S. Xantheas
Sutanay Choudhury
GNN
49
5
0
25 Nov 2022
Improving Robust Generalization by Direct PAC-Bayesian Bound
  Minimization
Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization
Zifa Wang
Nan Ding
Tomer Levinboim
Xi Chen
Radu Soricut
AAML
37
5
0
22 Nov 2022
ArrayFlex: A Systolic Array Architecture with Configurable Transparent
  Pipelining
ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining
C. Peltekis
D. Filippas
G. Dimitrakopoulos
C. Nicopoulos
D. Pnevmatikatos
24
5
0
22 Nov 2022
Previous
123456...222324
Next