Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.05567
Cited By
Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures
16 August 2018
E. Georganas
Sasikanth Avancha
K. Banerjee
Dhiraj D. Kalamkar
G. Henry
Hans Pabst
A. Heinecke
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures"
28 / 28 papers shown
Title
High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures
Xiang Fu
Xinpeng Zhang
Jixiang Ma
Peng Zhao
Shuai-bing Lu
Xu T. Liu
3DV
41
0
0
01 Aug 2024
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation
Lucas Alvarenga
Victor Ferrari
Rafael Souza
M. Pereira
Guido Araujo
21
0
0
15 Jul 2024
LookupFFN: Making Transformers Compute-lite for CPU inference
Zhanpeng Zeng
Michael Davies
Pranav Pulijala
Karthikeyan Sankaralingam
Vikas Singh
38
5
0
12 Mar 2024
MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor
Zheng-Kuo Wu
28
1
0
11 Jul 2023
Im2win: Memory Efficient Convolution On SIMD Architectures
Shuai-bing Lu
Jun Chu
Xuantong Liu
31
4
0
25 Jun 2023
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
E. Georganas
Dhiraj D. Kalamkar
K. Voronin
Abhisek Kundu
Antonio Noack
Hans Pabst
Alexander Breuer
A. Heinecke
16
2
0
25 Apr 2023
Kernel-Segregated Transpose Convolution Operation
Vijay Srinivas Tida
Sai Venkatesh Chilukoti
X. Hei
Sonya Hsu
ViT
28
2
0
08 Sep 2022
AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs
Chendi Li
Haipeng Jia
Hang Cao
Jianyu Yao
Boqian Shi
Chunyang Xiang
Jinbo Sun
Pengqi Lu
Yunquan Zhang
22
7
0
17 Aug 2022
Towards Transmission-Friendly and Robust CNN Models over Cloud and Device
Chuntao Ding
Zhichao Lu
F. Xu
Vishnu Boddeti
Yidong Li
Jiannong Cao
27
14
0
20 Jul 2022
Towards Effective Depthwise Convolutions on ARMv8 Architecture
Ruochen Hao
Qinglin Wang
Shangfei Yin
Tianyang Zhou
Siqi Shen
Songzhu Mei
Jie Liu
MQ
MDE
14
1
0
24 Jun 2022
Fast matrix multiplication for binary and ternary CNNs on ARM CPU
A. Trusov
E. Limonova
D. Nikolaev
V. Arlazarov
MQ
27
5
0
18 May 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Lois Orosa
Skanda Koppula
Yaman Umuroglu
Konstantinos Kanellopoulos
Juan Gómez Luna
Michaela Blott
K. Vissers
O. Mutlu
48
4
0
04 Feb 2022
Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators
Yangjie Zhou
Mengtian Yang
Cong Guo
Jingwen Leng
Yun Liang
Quan Chen
Minyi Guo
Yuhao Zhu
34
34
0
08 Oct 2021
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads
E. Georganas
Dhiraj D. Kalamkar
Sasikanth Avancha
Menachem Adelman
Deepti Aggarwal
...
Ramanarayan Mohanty
Hans Pabst
Brian Retford
Barukh Ziv
A. Heinecke
42
17
0
12 Apr 2021
Applying the Roofline model for Deep Learning performance optimizations
Jacek Czaja
Michal Gallus
Joanna Wozna
Adam Grygielski
Luo Tao
8
3
0
23 Sep 2020
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Stefanos Laskaridis
Stylianos I. Venieris
Mario Almeida
Ilias Leontiadis
Nicholas D. Lane
28
266
0
14 Aug 2020
Optimizing Grouped Convolutions on Edge Devices
Perry Gibson
José Cano
Jack Turner
Elliot J. Crowley
Michael F. P. O'Boyle
Amos Storkey
24
25
0
17 Jun 2020
Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale HPC Production Systems with Containers
D. Brayford
S. Vallecorsa
14
8
0
20 May 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
Sudarshan Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
56
48
0
10 May 2020
The Parallelism Motifs of Genomic Data Analysis
Katherine Yelick
A. Buluç
M. Awan
A. Azad
Benjamin Brock
...
Giulia Guidi
S. Hofmeyr
Oguz Selvitopi
Cristina Teodoropol
L. Oliker
19
17
0
20 Jan 2020
High Performance Depthwise and Pointwise Convolutions on Mobile Devices
Pengfei Zhang
Eric Lo
Baotong Lu
19
34
0
03 Jan 2020
SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors
Zhangxiaowen Gong
Houxiang Ji
Christopher W. Fletcher
C. Hughes
Josep Torrellas
36
5
0
22 Nov 2019
The Indirect Convolution Algorithm
Marat Dukhan
19
42
0
03 Jul 2019
High-Performance Deep Learning via a Single Building Block
E. Georganas
K. Banerjee
Dhiraj D. Kalamkar
Sasikanth Avancha
Anand Venkat
Michael J. Anderson
G. Henry
Hans Pabst
A. Heinecke
26
12
0
15 Jun 2019
A Study of BFLOAT16 for Deep Learning Training
Dhiraj D. Kalamkar
Dheevatsa Mudigere
Naveen Mellempudi
Dipankar Das
K. Banerjee
...
Sudarshan Srinivasan
Abhisek Kundu
M. Smelyanskiy
Bharat Kaul
Pradeep Dubey
MQ
30
338
0
29 May 2019
Distilling with Performance Enhanced Students
Jack Turner
Elliot J. Crowley
Valentin Radu
José Cano
Amos Storkey
Michael F. P. O'Boyle
24
3
0
24 Oct 2018
ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Matthew Sotoudeh
Anand Venkat
Michael J. Anderson
E. Georganas
A. Heinecke
Jason Knight
19
9
0
12 Oct 2018
Optimizing CNN Model Inference on CPUs
Yizhi Liu
Yao Wang
Ruofei Yu
Mu Li
Vin Sharma
Yida Wang
12
152
0
07 Sep 2018
1