Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18236
Cited By
Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering
15 May 2023
Braedy Kuzma
Ivan Korostelev
J. P. L. Carvalho
José Moreira
Christopher Barton
Guido Araujo
J. N. Amaral
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering"
5 / 5 papers shown
Title
Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?
Jens Domke
Emil Vatai
Aleksandr Drozd
Peng Chen
Yosuke Oyama
...
Shweta Salaria
Daichi Mukunoki
Artur Podobas
Mohamed Wahib
Satoshi Matsuoka
56
25
0
27 Oct 2020
Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors
C. Alappat
Johannes Hofmann
G. Hager
Holger Fehske
A. Bishop
G. Wellein
31
17
0
09 Feb 2020
NVIDIA Tensor Core Programmability, Performance & Precision
Stefano Markidis
Steven W. D. Chien
Erwin Laure
Ivy Bo Peng
Jeffrey S. Vetter
42
374
0
11 Mar 2018
In-Datacenter Performance Analysis of a Tensor Processing Unit
N. Jouppi
C. Young
Nishant Patil
David Patterson
Gaurav Agrawal
...
Vijay Vasudevan
Richard Walter
Walter Wang
Eric Wilcox
Doe Hyun Yoon
235
4,638
0
16 Apr 2017
Parallel Multi Channel Convolution using General Matrix Multiplication
Aravind Vasudevan
Andrew Anderson
David Gregg
49
140
0
06 Apr 2017
1