Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.15152
Cited By
Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library
29 August 2023
Hiroyuki Ootomo
Rio Yokota
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library"
6 / 6 papers shown
Title
Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance
Hiroyuki Ootomo
Rio Yokota
38
32
0
07 Mar 2022
tcFFT: Accelerating Half-Precision FFT through Tensor Cores
Bin-Rui Li
Shenggan Cheng
James Lin
16
12
0
23 Apr 2021
A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels
Peng Chen
Mohamed Wahib
Shiníchiro Takizawa
Ryousei Takano
Satoshi Matsuoka
28
22
0
14 Jul 2019
Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking
Zhe Jia
Marco Maggioni
Benjamin Staiger
D. Scarpazza
48
309
0
18 Apr 2018
NVIDIA Tensor Core Programmability, Performance & Precision
Stefano Markidis
Steven W. D. Chien
Erwin Laure
Ivy Bo Peng
Jeffrey S. Vetter
36
372
0
11 Mar 2018
In-Datacenter Performance Analysis of a Tensor Processing Unit
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
...
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
213
4,626
0
16 Apr 2017
1