Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.04760
Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit
16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"In-Datacenter Performance Analysis of a Tensor Processing Unit"
50 / 1,167 papers shown
Title
Ascend HiFloat8 Format for Deep Learning
Yuanyong Luo
Zhongxing Zhang
Richard Wu
Hu Liu
Ying Jin
...
Korviakov Vladimir
Bobrin Maxim
Yuhao Hu
Guanfu Chen
Zeyi Huang
MQ
44
2
0
25 Sep 2024
FreeRide: Harvesting Bubbles in Pipeline Parallelism
Jiashu Zhang
Zihan Pan
Molly
Xu
Khuzaima S. Daudjee
149
0
0
11 Sep 2024
Say No to Freeloader: Protecting Intellectual Property of Your Deep Model
Lianyu Wang
Ming Wang
Huazhu Fu
Daoqiang Zhang
86
3
0
23 Aug 2024
When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design
Abhishek Moitra
Abhiroop Bhattacharjee
Yuhang Li
Youngeun Kim
Priyadarshini Panda
67
3
0
22 Aug 2024
Matmul or No Matmal in the Era of 1-bit LLMs
Jinendra Malekar
Mohammed E. Elbtity
Ramtin Zand
MQ
61
2
0
21 Aug 2024
High Performance Unstructured SpMM Computation Using Tensor Cores
Patrik Okanovic
Grzegorz Kwa'sniewski
P. S. Labini
Maciej Besta
Flavio Vella
Torsten Hoefler
116
6
0
21 Aug 2024
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Viraat Aryabumi
Yixuan Su
Raymond Ma
Adrien Morisot
Ivan Zhang
Acyr Locatelli
Marzieh Fadaee
Ahmet Üstün
Sara Hooker
SyDa
AI4CE
98
26
0
20 Aug 2024
Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture
Yu Feng
Weikai Lin
Zihan Liu
Jingwen Leng
Minyi Guo
Han Zhao
Xiaofeng Hou
Jieru Zhao
Yuhao Zhu
75
7
0
13 Aug 2024
Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10
Yiqi Liu
Yuqi Xue
Yu Cheng
Lingxiao Ma
Ziming Miao
Jilong Xue
Jian Huang
GNN
107
1
0
09 Aug 2024
Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms
Yuqi Xue
Yiqi Liu
Lifeng Nai
Jian Huang
38
0
0
07 Aug 2024
LLM-Aided Compilation for Tensor Accelerators
Charles Hong
Sahil Bhatia
Altan Haan
Shengjun Kris Dong
Dima Nikiforov
Alvin Cheung
Y. Shao
74
2
0
06 Aug 2024
PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping Policy
Rachmad Vidya Wicaksana Putra
Muhammad Abdullah Hanif
Mohamed Bennai
58
0
0
05 Aug 2024
Optical Computing for Deep Neural Network Acceleration: Foundations, Recent Developments, and Emerging Directions
S. Pasricha
80
0
0
30 Jul 2024
An Asynchronous Multi-core Accelerator for SNN inference
Zhuo Chen
De Ma
Xiaofei Jin
Qinghui Xing
Ouwen Jin
Xin Du
Shuibing He
Gang Pan
42
0
0
30 Jul 2024
The Magnificent Seven Challenges and Opportunities in Domain-Specific Accelerator Design for Autonomous Systems
Sabrina M. Neuman
Brian Plancher
Vijay Janapa Reddi
63
0
0
24 Jul 2024
Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations
Giorgos Armeniakos
Alexis Maras
S. Xydis
Dimitrios Soudris
MQ
56
4
0
19 Jul 2024
Integrated Hardware Architecture and Device Placement Search
Irene Wang
Jakub Tarnawski
Amar Phanishayee
Divya Mahajan
108
1
0
18 Jul 2024
Enhancing Split Computing and Early Exit Applications through Predefined Sparsity
Luigi Capogrosso
Enrico Fraccaroli
Giulio Petrozziello
Francesco Setti
Samarjit Chakraborty
Franco Fummi
Marco Cristani
79
3
0
16 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
Volkan Cevher
Yida Wang
George Karypis
117
5
0
12 Jul 2024
Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture
Mohammed E. Elbtity
Peyton S. Chandarana
Ramtin Zand
116
2
0
11 Jul 2024
Privacy-Preserving and Trustworthy Deep Learning for Medical Imaging
Kiarash Sedghighadikolaei
Attila A Yavuz
61
3
0
29 Jun 2024
How to Rent GPUs on a Budget
Zhouzi Li
Benjamin Berg
Arpan Mukhopadhyay
Mor Harchol-Balter
34
0
0
21 Jun 2024
Older and Wiser: The Marriage of Device Aging and Intellectual Property Protection of Deep Neural Networks
Ning Lin
Shaocong Wang
Yue Zhang
Yangu He
Kwunhang Wong
Arindam Basu
Dashan Shang
Xiaoming Chen
Zhongrui Wang
AAML
41
1
0
21 Jun 2024
AI in Space for Scientific Missions: Strategies for Minimizing Neural-Network Model Upload
Jonah Ekelund
Ricardo Vinuesa
Yuri Khotyaintsev
Pierre Henri
G. Delzanno
Stefano Markidis
84
1
0
20 Jun 2024
Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies
Johannes Pekkilä
Oskar Lappi
Fredrik Robertsén
Maarit J. Korpi-Lagg
63
0
0
13 Jun 2024
SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling
Sai Sanjeet
B. Sahoo
Keshab K. Parhi
74
0
0
11 Jun 2024
Diversified Batch Selection for Training Acceleration
Feng Hong
Yueming Lyu
Jiangchao Yao
Ya Zhang
Ivor W. Tsang
Yanfeng Wang
114
5
0
07 Jun 2024
Differentiable Combinatorial Scheduling at Scale
Mingju Liu
Yingjie Li
Jiaqi Yin
Zhiru Zhang
Cunxi Yu
63
1
0
06 Jun 2024
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
Joonhyung Lee
Jeongin Bae
Byeongwook Kim
S. Kwon
Dongsoo Lee
MQ
78
1
0
29 May 2024
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Clayton Sanford
Bahare Fatemi
Ethan Hall
Anton Tsitsulin
Seyed Mehran Kazemi
Jonathan J. Halcrow
Bryan Perozzi
Vahab Mirrokni
112
38
0
28 May 2024
Carbon Connect: An Ecosystem for Sustainable Computing
Benjamin C. Lee
David Brooks
Arthur van Benthem
Udit Gupta
G. Hills
...
Emma Strubell
Gu-Yeon Wei
Adam Wierman
Yuan Yao
Minlan Yu
43
2
0
22 May 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
Angeline Pouget
Lucas Beyer
Emanuele Bugliarello
Xiao Wang
Andreas Steiner
Xiao-Qi Zhai
Ibrahim Alabdulmohsin
VLM
84
9
0
22 May 2024
Generative AI for the Optimization of Next-Generation Wireless Networks: Basics, State-of-the-Art, and Open Challenges
Fahime Khoramnejad
Ekram Hossain
62
8
0
22 May 2024
Cost-Effective Fault Tolerance for CNNs Using Parameter Vulnerability Based Hardening and Pruning
Mohammad Hasan Ahmadilivani
Seyedhamidreza Mousavi
J. Raik
Masoud Daneshtalab
M. Jenihhin
AAML
68
3
0
17 May 2024
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
R. Prabhakar
R. Sivaramakrishnan
Darshan Gandhi
Yun Du
Mingran Wang
...
Urmish Thakker
Dawei Huang
Sumti Jairath
Kevin J. Brown
K. Olukotun
MoE
77
15
0
13 May 2024
Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights
Soyed Tuhin Ahmed
Michael Hefenbrock
M. Tahoori
UQCV
62
1
0
07 May 2024
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
Mohanad Odema
Luke Chen
Hyoukjun Kwon
Mohammad Abdullah Al Faruque
79
4
0
01 May 2024
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
Muhammad Adnan
Amar Phanishayee
Janardhan Kulkarni
Prashant J. Nair
Divyat Mahajan
80
0
0
23 Apr 2024
Automated Text Mining of Experimental Methodologies from Biomedical Literature
Ziqing Guo
57
1
0
21 Apr 2024
Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations
Yu Feng
Zihan Liu
Jingwen Leng
Minyi Guo
Yuhao Zhu
81
13
0
18 Apr 2024
Trackable Agent-based Evolution Models at Wafer Scale
Matthew Andres Moreno
Connor Yang
Emily L. Dolson
Luis Zaman
63
5
0
16 Apr 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
109
0
0
16 Apr 2024
Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator
Abhishek Tyagi
Reiley Jeyapaul
Chuteng Zhu
Paul N. Whatmough
Yuhao Zhu
36
0
0
14 Apr 2024
Bullion: A Column Store for Machine Learning
Gang Liao
Ye Liu
Jianjun Chen
Daniel J. Abadi
77
5
0
13 Apr 2024
Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen
Niansong Zhang
Shaojie Xiang
Zhichen Zeng
Mengjia Dai
Zhiru Zhang
104
15
0
07 Apr 2024
Enhancing Trust and Privacy in Distributed Networks: A Comprehensive Survey on Blockchain-based Federated Learning
Ji Liu
Chunlu Chen
Yu Li
Lin Sun
Yulun Song
Jingbo Zhou
Bo Jing
Dejing Dou
84
10
0
28 Mar 2024
Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings
Yassaman Ebrahimzadeh Maboud
Muhammad Adnan
Divyat Mahajan
Prashant J. Nair
AI4TS
124
0
0
22 Mar 2024
DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators
Andrew B. Kahng
Zhiang Wang
34
2
0
16 Mar 2024
FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for Energy-Efficient Edge Devices
Arnab Raha
Deepak A. Mathaikutty
Soumendu Kumar Ghosh
Shamik Kundu
29
7
0
14 Mar 2024
EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration
Bo Liu
Grace Li Zhang
Xunzhao Yin
Ulf Schlichtmann
Bing Li
MQ
AI4CE
77
0
0
25 Feb 2024
Previous
1
2
3
4
5
...
22
23
24
Next