ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.04760
  4. Cited By
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

16 April 2017
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
Raminder Bajwa
Sarah Bates
Suresh Bhatia
Nan Boden
Al Borchers
Rick Boyle
Pierre-luc Cantin
Clifford Chao
Chris Clark
Jeremy Coriell
Mike Daley
Matt Dau
Jeffrey Dean
Ben Gelb
Taraneh Ghaemmaghami
Rajendra Gottipati
William Gulland
Robert Hagmann
C. Richard Ho
Doug Hogberg
John Hu
R. Hundt
Dan Hurt
Julian Ibarz
A. Jaffey
Alek Jaworski
Alexander Kaplan
Harshit Khaitan
Andy Koch
Naveen Kumar
Steve Lacy
James Laudon
James Law
Diemthu Le
Chris Leary
Zhuyuan Liu
Kyle Lucke
Alan Lundin
Gordon MacKean
Adriana Maggiore
Maire Mahony
Kieran Miller
R. Nagarajan
Ravi Narayanaswami
Ray Ni
Kathy Nix
Thomas Norrie
Mark Omernick
Narayana Penukonda
Andy Phelps
Jonathan Ross
Matt Ross
Amir Salek
Emad Samadiani
Chris Severn
Gregory Sizikov
Matthew Snelham
Jed Souter
Dan Steinberg
Andy Swing
Mercedes Tan
Gregory Thorson
Bo Tian
Horia Toma
Erick Tuttle
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
ArXivPDFHTML

Papers citing "In-Datacenter Performance Analysis of a Tensor Processing Unit"

50 / 1,164 papers shown
Title
SpikeX: Exploring Accelerator Architecture and Network-Hardware Co-Optimization for Sparse Spiking Neural Networks
SpikeX: Exploring Accelerator Architecture and Network-Hardware Co-Optimization for Sparse Spiking Neural Networks
Boxun Xu
Richard Boone
Peng Li
7
0
0
18 May 2025
LLM-DSE: Searching Accelerator Parameters with LLM Agents
LLM-DSE: Searching Accelerator Parameters with LLM Agents
Hanyu Wang
Xinrui Wu
Zijian Ding
Su Zheng
Chengyue Wang
Tony Nowatzki
Yizhou Sun
Jason Cong
2
0
0
18 May 2025
Analog Foundation Models
Analog Foundation Models
Julian Büchel
Iason Chalas
Giovanni Acampa
An Chen
Omobayode Fagbohungbe
Sidney Tsai
Kaoutar El Maghraoui
Manuel Le Gallo
Abbas Rahimi
Abu Sebastian
MQ
35
0
0
14 May 2025
QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives
QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives
X. Zhang
Shaohui Peng
Qirui Zhou
Yuanbo Wen
Qi Guo
...
Ke Gao
Chen Zhao
Yanjun Wu
Yunji Chen
Ling Li
VLM
39
0
0
08 May 2025
QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach
QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach
Shouyang Dong
Yuanbo Wen
Jun Bi
Di Huang
Jiaming Guo
...
Yifan Hao
Xuehai Zhou
Tianshi Chen
Qi Guo
Yunji Chen
32
0
0
04 May 2025
Nonlinear Computation with Linear Optics via Source-Position Encoding
Nonlinear Computation with Linear Optics via Source-Position Encoding
N. Richardson
C. Bosch
R. P. Adams
39
0
0
29 Apr 2025
Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models
Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models
Haotian Ye
Himanshu Jain
Chong You
A. Suresh
Haowei Lin
James Zou
Felix X. Yu
36
0
0
12 Apr 2025
Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware
Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware
Ching-Yi Lin
Sahil Shah
MQ
73
0
0
11 Apr 2025
Quattro: Transformer-Accelerated Iterative Linear Quadratic Regulator Framework for Fast Trajectory Optimization
Quattro: Transformer-Accelerated Iterative Linear Quadratic Regulator Framework for Fast Trajectory Optimization
Yue Wang
Hoayu Wang
Zhaoxing Li
49
0
0
02 Apr 2025
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers
Chaojian Li
Sixu Li
Linrui Jiang
Jingqun Zhang
Yingyan Lin
39
0
0
31 Mar 2025
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Xuan Shen
Weize Ma
Jing Liu
Changdi Yang
Rui Ding
...
Wei Niu
Yanzhi Wang
Pu Zhao
Jun Lin
Jiuxiang Gu
MQ
57
0
0
20 Mar 2025
Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs
Qizhe Wu
Huawen Liang
Yuchen Gui
Zhichen Zeng
Z. He
...
Letian Zhao
Zhaoxi Zeng
W. Yuan
Wei Wu
Xi Jin
49
0
0
08 Mar 2025
FORTALESA: Fault-Tolerant Reconfigurable Systolic Array for DNN Inference
N. Cherezova
Artur Jutman
M. Jenihhin
70
0
0
06 Mar 2025
Strassen Multisystolic Array Hardware Architectures
Strassen Multisystolic Array Hardware Architectures
Trevor E. Pogue
N. Nicolici
76
0
0
14 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
70
3
0
11 Feb 2025
PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts
Zeman Li
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
MoMe
75
1
0
10 Feb 2025
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
Nikhil Bhendawade
Mahyar Najibi
Devang Naik
Irina Belousova
MoE
85
0
0
04 Feb 2025
Life-Cycle Emissions of AI Hardware: A Cradle-To-Grave Approach and Generational Trends
Life-Cycle Emissions of AI Hardware: A Cradle-To-Grave Approach and Generational Trends
Ian Schneider
Hui Xu
Stephan Benecke
David Patterson
Keguo Huang
Parthasarathy Ranganathan
Cooper Elsworth
70
2
0
01 Feb 2025
A Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression
A Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression
Shupeng Ning
Hanqing Zhu
Chenghao Feng
Jiaqi Gu
David Z. Pan
Ray T. Chen
42
0
0
01 Feb 2025
SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity
Zichen Fan
Steve Dai
Rangharajan Venkatesan
Dennis Sylvester
Brucek Khailany
MQ
55
0
0
28 Jan 2025
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
Guoyu Li
Shengyu Ye
Chong Chen
Yang Wang
Fan Yang
Ting Cao
Cheng Liu
Mohamed M. Sabry
Mao Yang
MQ
169
0
0
18 Jan 2025
Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations
Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations
Trevor E. Pogue
N. Nicolici
63
0
0
15 Jan 2025
tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for
  Low-Precision Edge AI
tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI
Harideep Nair
P. Vellaisamy
Albert Chen
Joseph Finn
Anna Li
Manav Trivedi
J. Shen
29
2
0
23 Dec 2024
Leveraging Highly Approximated Multipliers in DNN Inference
Leveraging Highly Approximated Multipliers in DNN Inference
Georgios Zervakis
Fabio Frustaci
Ourania Spantidi
Iraklis Anagnostopoulos
H. Amrouch
Jörg Henkel
84
1
0
21 Dec 2024
PreNeT: Leveraging Computational Features to Predict Deep Neural Network
  Training Time
PreNeT: Leveraging Computational Features to Predict Deep Neural Network Training Time
Alireza Pourali
Arian Boukani
Hamzeh Khazaei
72
0
0
20 Dec 2024
Optimal Gradient Checkpointing for Sparse and Recurrent Architectures
  using Off-Chip Memory
Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory
Wadjih Bencheikh
Jan Finkbeiner
Emre Neftci
76
0
0
16 Dec 2024
A comprehensive GeoAI review: Progress, Challenges and Outlooks
A comprehensive GeoAI review: Progress, Challenges and Outlooks
Anasse Boutayeb
Iyad Lahsen-cherif
Ahmed El Khadimi
89
0
0
16 Dec 2024
The Evolution and Future Perspectives of Artificial Intelligence
  Generated Content
The Evolution and Future Perspectives of Artificial Intelligence Generated Content
Chengzhang Zhu
Luobin Cui
Ying Tang
Jiacun Wang
92
1
0
02 Dec 2024
A Parallel Scan Algorithm in the Tensor Core Unit Model
A Parallel Scan Algorithm in the Tensor Core Unit Model
Anastasios Zouzias
William F. McColl
LRM
57
0
0
26 Nov 2024
SoK: Decentralized AI (DeAI)
SoK: Decentralized AI (DeAI)
Zhipeng Wang
Rui Sun
Elizabeth Lui
Vatsal Shah
Xihan Xiong
Jiahao Sun
Davide Crapis
William Knottenbelt
104
1
0
26 Nov 2024
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
Yu Zhang
Ming Wang
Lancheng Zou
Wulong Liu
Hui-Ling Zhen
M. Yuan
Bei Yu
MQ
79
1
0
25 Nov 2024
SoK: A Systems Perspective on Compound AI Threats and Countermeasures
SoK: A Systems Perspective on Compound AI Threats and Countermeasures
Sarbartha Banerjee
Prateek Sahu
Mulong Luo
Anjo Vahldiek-Oberwagner
N. Yadwadkar
Mohit Tiwari
AAML
77
0
0
20 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
257
3
0
20 Nov 2024
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
75
0
0
20 Nov 2024
Running Markov Chain Monte Carlo on Modern Hardware and Software
Running Markov Chain Monte Carlo on Modern Hardware and Software
Pavel Sountsov
Colin Carroll
Matthew D. Hoffman
BDL
39
3
0
06 Nov 2024
DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic
  Programming Algorithms in Bioinformatics
DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics
Yingqi Cao
Anshu Gupta
Jason Liang
Yatish Turakhia
23
0
0
05 Nov 2024
Trustworthy Federated Learning: Privacy, Security, and Beyond
Trustworthy Federated Learning: Privacy, Security, and Beyond
Chunlu Chen
Ji Liu
Haowen Tan
Xingjian Li
Kevin I-Kai Wang
Peng Li
Kouichi Sakurai
Dejing Dou
FedML
52
4
0
03 Nov 2024
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
Apostolos Kokolis
Michael Kuchnik
John Hoffman
Adithya Kumar
Parth Malani
Faye Ma
Zachary DeVito
Shri Kiran Srinivasan
Kalyan Saladi
Carole-Jean Wu
178
7
0
29 Oct 2024
Design Space Exploration of Embedded SoC Architectures for Real-Time
  Optimal Control
Design Space Exploration of Embedded SoC Architectures for Real-Time Optimal Control
Kris Shengjun Dong
Dima Nikiforov
Widyadewi Soedarmadji
Minh Nguyen
Christopher Fletcher
Y. Shao
21
0
0
16 Oct 2024
Efficiera Residual Networks: Hardware-Friendly Fully Binary Weight with
  2-bit Activation Model Achieves Practical ImageNet Accuracy
Efficiera Residual Networks: Hardware-Friendly Fully Binary Weight with 2-bit Activation Model Achieves Practical ImageNet Accuracy
Shuntaro Takahashi
Takuya Wakisaka
Hiroyuki Tokunaga
MQ
37
0
0
15 Oct 2024
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
Arya Tschand
Arun Tejusve Raghunath Rajan
S. Idgunji
Anirban Ghosh
J. Holleman
...
Rowan Taubitz
Sean Zhan
Scott Wasson
David Kanter
Vijay Janapa Reddi
64
3
0
15 Oct 2024
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Ruhai Lin
Rui-Jie Zhu
Jason Eshraghian
46
1
0
12 Oct 2024
Data Efficiency for Large Recommendation Models
Data Efficiency for Large Recommendation Models
Kshitij Jain
Jingru Xie
Kevin Regan
Cheng Chen
Jie Han
...
Todd Phillips
Myles Sussman
Matt Troup
Angel Yu
Jia Zhuo
OffRL
30
0
0
08 Oct 2024
RNC: Efficient RRAM-aware NAS and Compilation for DNNs on
  Resource-Constrained Edge Devices
RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices
Kam Chi Loong
Shihao Han
Sishuo Liu
Ning Lin
Zhongrui Wang
24
0
0
27 Sep 2024
A method of using RSVD in residual calculation of LowBit GEMM
A method of using RSVD in residual calculation of LowBit GEMM
Hongyaoxing Gu
MQ
35
0
0
27 Sep 2024
QuForge: A Library for Qudits Simulation
QuForge: A Library for Qudits Simulation
T. S. Farias
Lucas Friedrich
Jonas Maziero
31
2
0
26 Sep 2024
Ascend HiFloat8 Format for Deep Learning
Ascend HiFloat8 Format for Deep Learning
Yuanyong Luo
Zhongxing Zhang
Richard Wu
Hu Liu
Ying Jin
...
Korviakov Vladimir
Bobrin Maxim
Yuhao Hu
Guanfu Chen
Zeyi Huang
MQ
30
1
0
25 Sep 2024
FreeRide: Harvesting Bubbles in Pipeline Parallelism
FreeRide: Harvesting Bubbles in Pipeline Parallelism
Jiashu Zhang
Zihan Pan
Molly
Xu
Khuzaima S. Daudjee
90
0
0
11 Sep 2024
Say No to Freeloader: Protecting Intellectual Property of Your Deep
  Model
Say No to Freeloader: Protecting Intellectual Property of Your Deep Model
Lianyu Wang
Ming Wang
Huazhu Fu
Daoqiang Zhang
42
2
0
23 Aug 2024
When In-memory Computing Meets Spiking Neural Networks -- A Perspective
  on Device-Circuit-System-and-Algorithm Co-design
When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design
Abhishek Moitra
Abhiroop Bhattacharjee
Yuhang Li
Youngeun Kim
Priyadarshini Panda
30
1
0
22 Aug 2024
1234...222324
Next