Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.04760
Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit
16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"In-Datacenter Performance Analysis of a Tensor Processing Unit"
50 / 1,167 papers shown
Title
Extreme Acceleration of Graph Neural Network-based Prediction Models for Quantum Chemistry
Hatem Helal
J. Firoz
Jenna A. Bilbrey
M. M. Krell
Tom Murray
Ang Li
S. Xantheas
Sutanay Choudhury
GNN
111
5
0
25 Nov 2022
Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization
Zifa Wang
Nan Ding
Tomer Levinboim
Xi Chen
Radu Soricut
AAML
79
6
0
22 Nov 2022
ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining
C. Peltekis
D. Filippas
G. Dimitrakopoulos
C. Nicopoulos
D. Pnevmatikatos
44
5
0
22 Nov 2022
Intelligent Computing: The Latest Advances, Challenges and Future
Shiqiang Zhu
Ting Yu
Tao Xu
Hongyang Chen
Schahram Dustdar
...
Tariq S. Durrani
Huaimin Wang
Jiangxing Wu
Tongyi Zhang
Yunhe Pan
AI4CE
87
129
0
21 Nov 2022
Intelligence Processing Units Accelerate Neuromorphic Learning
P. Sun
A. Titterton
Anjlee Gopiani
Tim Santos
A. Basu
Wei D. Lu
Jason K. Eshraghian
47
8
0
19 Nov 2022
Long-Range Zero-Shot Generative Deep Network Quantization
Yan Luo
Yangcheng Gao
Zhao Zhang
Haijun Zhang
Mingliang Xu
Meng Wang
MQ
90
10
0
13 Nov 2022
Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss
Guanlong Zhao
Quan Wang
Han Lu
Yiling Huang
Ignacio López Moreno
69
14
0
11 Nov 2022
Power Grid Congestion Management via Topology Optimization with AlphaZero
Matthias Dorfer
Anton R. Fuxjäger
Kristián Kozák
P. Blies
Marcel Wasserer
62
21
0
10 Nov 2022
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems
Shaan Bijwadia
Shuo-yiin Chang
Yue Liu
Tara N. Sainath
Chaoyang Zhang
Yanzhang He
75
8
0
01 Nov 2022
Tech Report: One-stage Lightweight Object Detectors
Deokki Hong
ObjD
35
0
0
31 Oct 2022
FatNet: High Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks
Riad Ibadulla
Thomas M. Chen
C. Reyes-Aldasoro
22
8
0
30 Oct 2022
An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks
Yuxiang Wu
Yu Zhao
Baotian Hu
Pasquale Minervini
Pontus Stenetorp
Sebastian Riedel
RALM
KELM
101
45
0
30 Oct 2022
ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs
Guyue Huang
Yang Bai
Liu Liu
Yuke Wang
Bei Yu
Yufei Ding
Yuan Xie
88
18
0
29 Oct 2022
tf.data service: A Case for Disaggregating ML Input Data Processing
Andrew Audibert
Yangrui Chen
D. Graur
Ana Klimovic
Jiří Šimša
C. A. Thekkath
95
18
0
26 Oct 2022
A Trustless Architecture of Blockchain-enabled Metaverse
Minghui Xu
Yihao Guo
Qin Hu
Zehui Xiong
Dongxiao Yu
Xiuzhen Cheng
112
52
0
23 Oct 2022
Benchmarking GPU and TPU Performance with Graph Neural Networks
X. Ju
Yunsong Wang
D. Murnane
Nicholas Choma
S. Farrell
P. Calafiura
GNN
42
2
0
21 Oct 2022
Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction
Muralidhar Andoorveedu
Zhanda Zhu
Bojian Zheng
Gennady Pekhimenko
47
7
0
19 Oct 2022
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs
Yaoyao Ding
Cody Hao Yu
Bojian Zheng
Yizhi Liu
Yida Wang
Gennady Pekhimenko
91
32
0
18 Oct 2022
Knowledge-grounded Dialog State Tracking
Dian Yu
Mingqiu Wang
Yuan Cao
Izhak Shafran
Laurent El Shafey
H. Soltau
BDL
82
3
0
13 Oct 2022
Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR
DongSeon Hwang
K. Sim
Yu Zhang
Trevor Strohman
53
11
0
11 Oct 2022
RoHNAS: A Neural Architecture Search Framework with Conjoint Optimization for Adversarial Robustness and Hardware Efficiency of Convolutional and Capsule Networks
Alberto Marchisio
Vojtěch Mrázek
Andrea Massa
Beatrice Bussolino
Maurizio Martina
Mohamed Bennai
AAML
140
6
0
11 Oct 2022
Energy-Efficient Deployment of Machine Learning Workloads on Neuromorphic Hardware
Peyton S. Chandarana
Mohammadreza Mohammadi
J. Seekings
Ramtin Zand
73
6
0
10 Oct 2022
GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation
O. Sýkora
P. Phothilimthana
Charith Mendis
Amir Yazdanbakhsh
GNN
108
21
0
08 Oct 2022
Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR
Sami Alabed
Dominik Grewe
Juliana Franco
Bart Chrzaszcz
Tom Natan
Tamara Norman
Norman A. Rink
Dimitrios Vytiniotis
Michael Schaarschmidt
MoE
39
1
0
07 Oct 2022
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Skanda Koppula
Yazhe Li
Evan Shelhamer
Andrew Jaegle
Nikhil Parthasarathy
Relja Arandjelović
João Carreira
Olivier J. Hénaff
84
9
0
30 Sep 2022
Physics-aware Differentiable Discrete Codesign for Diffractive Optical Neural Networks
Yingjie Li
Ruiyang Chen
Weilu Gao
Cunxi Yu
82
12
0
28 Sep 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
171
83
0
22 Sep 2022
POAS: A high-performance scheduling framework for exploiting Accelerator Level Parallelism
Pablo Antonio Martínez
Gregorio Bernabé
J. M. García
23
1
0
21 Sep 2022
In-Network Accumulation: Extending the Role of NoC for DNN Acceleration
Binayak Tiwari
Mei Yang
Xiaohang Wang
Yingtao Jiang
114
0
0
21 Sep 2022
Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud
Geraldo F. Oliveira
Juan Gómez Luna
Saugata Ghose
Amirali Boroumand
O. Mutlu
71
26
0
19 Sep 2022
PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM) Systems
Qing Jin
Zhiyu Chen
J. Ren
Yanyu Li
Yanzhi Wang
Kai-Min Yang
MQ
38
4
0
18 Sep 2022
A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems
Andrea Gesmundo
61
18
0
15 Sep 2022
FreeGaze: Resource-efficient Gaze Estimation via Frequency Domain Contrastive Learning
Li-Huan Du
Guohao Lan
68
4
0
14 Sep 2022
TASKED: Transformer-based Adversarial learning for human activity recognition using wearable sensors via Self-KnowledgE Distillation
Sungho Suh
Vitor Fortes Rey
P. Lukowicz
97
63
0
14 Sep 2022
Chiplets and the Codelet Model
D. Fox
J. M. Diaz
Xiaoming Li
27
0
0
13 Sep 2022
Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation
Amir Yazdanbakhsh
Ashkan Moradifirouzabadi
Zheng Li
Mingu Kang
84
33
0
01 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
88
60
0
30 Aug 2022
Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System
Zhendong Wang
Xiaoming Zeng
Xulong Tang
Qiang Yan
Xingbo Hu
Yang Hu
AAML
MIACV
FedML
43
6
0
29 Aug 2022
DiVa: An Accelerator for Differentially Private Machine Learning
Beom-Joo Park
Ranggi Hwang
Dongho Yoon
Yoonhyuk Choi
Minsoo Rhu
53
9
0
26 Aug 2022
Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems
Prasoon Sinha
Akhil Guliani
Rutwik Jain
Brandon Tran
Matthew D. Sinclair
Shivaram Venkataraman
79
18
0
23 Aug 2022
ECI: a Customizable Cache Coherency Stack for Hybrid FPGA-CPU Architectures
Abishek Ramdas
Michael J. Giardino
Runbin Shi
A. Turowski
David A. Cock
Gustavo Alonso
Timothy Roscoe
21
1
0
15 Aug 2022
Optimizing Anchor-based Detectors for Autonomous Driving Scenes
Xianzhi Du
Wei-Chih Hung
Nayeon Lee
ObjD
31
1
0
11 Aug 2022
PROFET: Profiling-based CNN Training Latency Prophet for GPU Cloud Instances
Sungjae Lee
Y. Hur
Subin Park
Kyungyong Lee
51
2
0
10 Aug 2022
A Time-to-first-spike Coding and Conversion Aware Training for Energy-Efficient Deep Spiking Neural Network Processor Design
Dongwoo Lew
Kyungchul Lee
Jongsun Park
32
14
0
09 Aug 2022
On Fast Simulation of Dynamical System with Neural Vector Enhanced Numerical Solver
Zhongzhan Huang
Senwei Liang
Hong Zhang
Haizhao Yang
Liang Lin
AI4CE
99
9
0
07 Aug 2022
PalQuant: Accelerating High-precision Networks on Low-precision Accelerators
Qinghao Hu
Gang Li
Qiman Wu
Jian Cheng
MQ
35
2
0
03 Aug 2022
OLLIE: Derivation-based Tensor Program Optimizer
Liyan Zheng
Haojie Wang
Jidong Zhai
Muyan Hu
Zixuan Ma
Tuowei Wang
Shizhi Tang
Lei Xie
Kezhao Huang
Zhihao Jia
73
3
0
02 Aug 2022
CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for Energy-Efficient Low-precision Deep Convolutional Neural Networks
Muhammad Abdullah Hanif
G. M. Sarda
Alberto Marchisio
Guido Masera
Maurizio Martina
Mohamed Bennai
MQ
62
4
0
31 Jul 2022
Text Classification in Memristor-based Spiking Neural Networks
Jinqi Huang
A. Serb
S. Stathopoulos
T. Prodromakis
48
14
0
27 Jul 2022
Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications
Marcelo Orenes-Vera
Esin Tureci
D. Wentzlaff
M. Martonosi
47
22
0
26 Jul 2022
Previous
1
2
3
...
5
6
7
...
22
23
24
Next