Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1704.04760
Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit
16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
Re-assign community
ArXiv
PDF
HTML
Papers citing
"In-Datacenter Performance Analysis of a Tensor Processing Unit"
50 / 1,164 papers shown
Title
Matmul or No Matmal in the Era of 1-bit LLMs
Jinendra Malekar
Mohammed E. Elbtity
Ramtin Zand
MQ
32
2
0
21 Aug 2024
High Performance Unstructured SpMM Computation Using Tensor Cores
Patrik Okanovic
Grzegorz Kwa'sniewski
P. S. Labini
Maciej Besta
Flavio Vella
Torsten Hoefler
49
5
0
21 Aug 2024
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Viraat Aryabumi
Yixuan Su
Raymond Ma
Adrien Morisot
Ivan Zhang
Acyr Locatelli
Marzieh Fadaee
Ahmet Üstün
Sara Hooker
SyDa
AI4CE
48
20
0
20 Aug 2024
Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture
Yu Feng
Weikai Lin
Zihan Liu
Jingwen Leng
Minyi Guo
Han Zhao
Xiaofeng Hou
Jieru Zhao
Yuhao Zhu
36
3
0
13 Aug 2024
Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10
Yiqi Liu
Yuqi Xue
Yu Cheng
Lingxiao Ma
Ziming Miao
Jilong Xue
Jian Huang
GNN
26
1
0
09 Aug 2024
Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms
Yuqi Xue
Yiqi Liu
Lifeng Nai
Jian Huang
24
0
0
07 Aug 2024
LLM-Aided Compilation for Tensor Accelerators
Charles Hong
Sahil Bhatia
Altan Haan
Shengjun Kris Dong
Dima Nikiforov
Alvin Cheung
Y. Shao
37
0
0
06 Aug 2024
PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping Policy
Rachmad Vidya Wicaksana Putra
Muhammad Abdullah Hanif
Muhammad Shafique
33
0
0
05 Aug 2024
Optical Computing for Deep Neural Network Acceleration: Foundations, Recent Developments, and Emerging Directions
S. Pasricha
67
0
0
30 Jul 2024
An Asynchronous Multi-core Accelerator for SNN inference
Zhuo Chen
De Ma
Xiaofei Jin
Qinghui Xing
Ouwen Jin
Xin Du
Shuibing He
Gang Pan
23
0
0
30 Jul 2024
The Magnificent Seven Challenges and Opportunities in Domain-Specific Accelerator Design for Autonomous Systems
Sabrina M. Neuman
Brian Plancher
Vijay Janapa Reddi
46
0
0
24 Jul 2024
Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations
Giorgos Armeniakos
Alexis Maras
S. Xydis
Dimitrios Soudris
MQ
26
3
0
19 Jul 2024
Integrated Hardware Architecture and Device Placement Search
Irene Wang
Jakub Tarnawski
Amar Phanishayee
Divya Mahajan
41
1
0
18 Jul 2024
Enhancing Split Computing and Early Exit Applications through Predefined Sparsity
Luigi Capogrosso
Enrico Fraccaroli
Giulio Petrozziello
Francesco Setti
Samarjit Chakraborty
Franco Fummi
Marco Cristani
36
3
0
16 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
45
3
0
12 Jul 2024
Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture
Mohammed E. Elbtity
Peyton S. Chandarana
Ramtin Zand
54
2
0
11 Jul 2024
Privacy-Preserving and Trustworthy Deep Learning for Medical Imaging
Kiarash Sedghighadikolaei
Attila A Yavuz
39
1
0
29 Jun 2024
How to Rent GPUs on a Budget
Zhouzi Li
Benjamin Berg
Arpan Mukhopadhyay
Mor Harchol-Balter
23
0
0
21 Jun 2024
Older and Wiser: The Marriage of Device Aging and Intellectual Property Protection of Deep Neural Networks
Ning Lin
Shaocong Wang
Yue Zhang
Yangu He
Kwunhang Wong
Arindam Basu
Dashan Shang
Xiaoming Chen
Zhongrui Wang
AAML
39
0
0
21 Jun 2024
AI in Space for Scientific Missions: Strategies for Minimizing Neural-Network Model Upload
Jonah Ekelund
Ricardo Vinuesa
Yuri Khotyaintsev
Pierre Henri
G. Delzanno
Stefano Markidis
33
0
0
20 Jun 2024
Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies
Johannes Pekkilä
Oskar Lappi
Fredrik Robertsén
Maarit J. Korpi-Lagg
22
0
0
13 Jun 2024
SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling
Sai Sanjeet
B. Sahoo
Keshab K. Parhi
51
0
0
11 Jun 2024
Diversified Batch Selection for Training Acceleration
Feng Hong
Yueming Lyu
Jiangchao Yao
Ya Zhang
Ivor W. Tsang
Yanfeng Wang
42
4
0
07 Jun 2024
Differentiable Combinatorial Scheduling at Scale
Mingju Liu
Yingjie Li
Jiaqi Yin
Zhiru Zhang
Cunxi Yu
40
0
0
06 Jun 2024
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
Joonhyung Lee
Jeongin Bae
Byeongwook Kim
S. Kwon
Dongsoo Lee
MQ
49
0
0
29 May 2024
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Clayton Sanford
Bahare Fatemi
Ethan Hall
Anton Tsitsulin
Seyed Mehran Kazemi
Jonathan J. Halcrow
Bryan Perozzi
Vahab Mirrokni
46
30
0
28 May 2024
Carbon Connect: An Ecosystem for Sustainable Computing
Benjamin C. Lee
David Brooks
Arthur van Benthem
Udit Gupta
G. Hills
...
Emma Strubell
Gu-Yeon Wei
Adam Wierman
Yuan Yao
Minlan Yu
25
2
0
22 May 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
Angeline Pouget
Lucas Beyer
Emanuele Bugliarello
Xiao Wang
Andreas Steiner
Xiao-Qi Zhai
Ibrahim M. Alabdulmohsin
VLM
33
7
0
22 May 2024
Generative AI for the Optimization of Next-Generation Wireless Networks: Basics, State-of-the-Art, and Open Challenges
Fahime Khoramnejad
Ekram Hossain
40
7
0
22 May 2024
Cost-Effective Fault Tolerance for CNNs Using Parameter Vulnerability Based Hardening and Pruning
Mohammad Hasan Ahmadilivani
Seyedhamidreza Mousavi
J. Raik
Masoud Daneshtalab
M. Jenihhin
AAML
45
3
0
17 May 2024
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
R. Prabhakar
R. Sivaramakrishnan
Darshan Gandhi
Yun Du
Mingran Wang
...
Urmish Thakker
Dawei Huang
Sumti Jairath
Kevin J. Brown
K. Olukotun
MoE
39
12
0
13 May 2024
Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights
Soyed Tuhin Ahmed
Michael Hefenbrock
M. Tahoori
UQCV
39
1
0
07 May 2024
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
Mohanad Odema
Luke Chen
Hyoukjun Kwon
Mohammad Abdullah Al Faruque
36
4
0
01 May 2024
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
Muhammad Adnan
Amar Phanishayee
Janardhan Kulkarni
Prashant J. Nair
Divyat Mahajan
45
0
0
23 Apr 2024
Automated Text Mining of Experimental Methodologies from Biomedical Literature
Ziqing Guo
34
1
0
21 Apr 2024
Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations
Yu Feng
Zihan Liu
Jingwen Leng
Minyi Guo
Yuhao Zhu
49
8
0
18 Apr 2024
Trackable Agent-based Evolution Models at Wafer Scale
Matthew Andres Moreno
Connor Yang
Emily L. Dolson
Luis Zaman
44
3
0
16 Apr 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
57
0
0
16 Apr 2024
Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator
Abhishek Tyagi
Reiley Jeyapaul
Chuteng Zhu
Paul N. Whatmough
Yuhao Zhu
24
0
0
14 Apr 2024
Bullion: A Column Store for Machine Learning
Gang Liao
Ye Liu
Jianjun Chen
Daniel J. Abadi
37
5
0
13 Apr 2024
Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen
Niansong Zhang
Shaojie Xiang
Zhichen Zeng
Mengjia Dai
Zhiru Zhang
54
14
0
07 Apr 2024
Enhancing Trust and Privacy in Distributed Networks: A Comprehensive Survey on Blockchain-based Federated Learning
Ji Liu
Chunlu Chen
Yu Li
Lin Sun
Yulun Song
Jingbo Zhou
Bo Jing
Dejing Dou
50
9
0
28 Mar 2024
Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings
Yassaman Ebrahimzadeh Maboud
Muhammad Adnan
Divyat Mahajan
Prashant J. Nair
AI4TS
40
0
0
22 Mar 2024
DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators
Andrew B. Kahng
Zhiang Wang
24
2
0
16 Mar 2024
FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for Energy-Efficient Edge Devices
Arnab Raha
Deepak A. Mathaikutty
Soumendu Kumar Ghosh
Shamik Kundu
19
7
0
14 Mar 2024
Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
Geonhwa Jeong
Po-An Tsai
Abhimanyu Bambhaniya
S. Keckler
Tushar Krishna
33
5
0
12 Mar 2024
EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration
Bo Liu
Grace Li Zhang
Xunzhao Yin
Ulf Schlichtmann
Bing Li
MQ
AI4CE
38
0
0
25 Feb 2024
CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware
Souvik Kundu
Anthony Sarah
Vinay Joshi
O. J. Omer
S. Subramoney
29
0
0
19 Feb 2024
Accelerating Sparse DNNs Based on Tiled GEMM
Cong Guo
Fengchen Xue
Jingwen Leng
Yuxian Qiu
Yue Guan
Weihao Cui
Quan Chen
Minyi Guo
21
10
0
16 Feb 2024
A Precision-Optimized Fixed-Point Near-Memory Digital Processing Unit for Analog In-Memory Computing
Elena Ferro
A. Vasilopoulos
Corey Lammie
Manuel Le Gallo
Luca Benini
I. Boybat
Abu Sebastian
17
0
0
12 Feb 2024
Previous
1
2
3
4
5
...
22
23
24
Next