ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.05567
  4. Cited By
Anatomy Of High-Performance Deep Learning Convolutions On SIMD
  Architectures

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

16 August 2018
E. Georganas
Sasikanth Avancha
K. Banerjee
Dhiraj D. Kalamkar
G. Henry
Hans Pabst
A. Heinecke
    BDL
ArXivPDFHTML

Papers citing "Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures"

28 / 28 papers shown
Title
High Performance Im2win and Direct Convolutions using Three Tensor
  Layouts on SIMD Architectures
High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures
Xiang Fu
Xinpeng Zhang
Jixiang Ma
Peng Zhao
Shuai-bing Lu
Xu T. Liu
3DV
41
0
0
01 Aug 2024
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive
  Evaluation
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation
Lucas Alvarenga
Victor Ferrari
Rafael Souza
M. Pereira
Guido Araujo
21
0
0
15 Jul 2024
LookupFFN: Making Transformers Compute-lite for CPU inference
LookupFFN: Making Transformers Compute-lite for CPU inference
Zhanpeng Zeng
Michael Davies
Pranav Pulijala
Karthikeyan Sankaralingam
Vikas Singh
38
5
0
12 Mar 2024
MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution
  Algorithm toward the SW26010 Processor
MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor
Zheng-Kuo Wu
28
1
0
11 Jul 2023
Im2win: Memory Efficient Convolution On SIMD Architectures
Im2win: Memory Efficient Convolution On SIMD Architectures
Shuai-bing Lu
Jun Chu
Xuantong Liu
31
4
0
25 Jun 2023
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor
  Abstractions on CPU Architectures
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
E. Georganas
Dhiraj D. Kalamkar
K. Voronin
Abhisek Kundu
Antonio Noack
Hans Pabst
Alexander Breuer
A. Heinecke
16
2
0
25 Apr 2023
Kernel-Segregated Transpose Convolution Operation
Kernel-Segregated Transpose Convolution Operation
Vijay Srinivas Tida
Sai Venkatesh Chilukoti
X. Hei
Sonya Hsu
ViT
28
2
0
08 Sep 2022
AutoTSMM: An Auto-tuning Framework for Building High-Performance
  Tall-and-Skinny Matrix-Matrix Multiplication on CPUs
AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs
Chendi Li
Haipeng Jia
Hang Cao
Jianyu Yao
Boqian Shi
Chunyang Xiang
Jinbo Sun
Pengqi Lu
Yunquan Zhang
22
7
0
17 Aug 2022
Towards Transmission-Friendly and Robust CNN Models over Cloud and
  Device
Towards Transmission-Friendly and Robust CNN Models over Cloud and Device
Chuntao Ding
Zhichao Lu
F. Xu
Vishnu Boddeti
Yidong Li
Jiannong Cao
27
14
0
20 Jul 2022
Towards Effective Depthwise Convolutions on ARMv8 Architecture
Towards Effective Depthwise Convolutions on ARMv8 Architecture
Ruochen Hao
Qinglin Wang
Shangfei Yin
Tianyang Zhou
Siqi Shen
Songzhu Mei
Jie Liu
MQ
MDE
14
1
0
24 Jun 2022
Fast matrix multiplication for binary and ternary CNNs on ARM CPU
Fast matrix multiplication for binary and ternary CNNs on ARM CPU
A. Trusov
E. Limonova
D. Nikolaev
V. Arlazarov
MQ
27
5
0
18 May 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network
  Accelerators
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Lois Orosa
Skanda Koppula
Yaman Umuroglu
Konstantinos Kanellopoulos
Juan Gómez Luna
Michaela Blott
K. Vissers
O. Mutlu
48
4
0
04 Feb 2022
Characterizing and Demystifying the Implicit Convolution Algorithm on
  Commercial Matrix-Multiplication Accelerators
Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators
Yangjie Zhou
Mengtian Yang
Cong Guo
Jingwen Leng
Yun Liang
Quan Chen
Minyi Guo
Yuhao Zhu
34
34
0
08 Oct 2021
Tensor Processing Primitives: A Programming Abstraction for Efficiency
  and Portability in Deep Learning & HPC Workloads
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads
E. Georganas
Dhiraj D. Kalamkar
Sasikanth Avancha
Menachem Adelman
Deepti Aggarwal
...
Ramanarayan Mohanty
Hans Pabst
Brian Retford
Barukh Ziv
A. Heinecke
42
17
0
12 Apr 2021
Applying the Roofline model for Deep Learning performance optimizations
Applying the Roofline model for Deep Learning performance optimizations
Jacek Czaja
Michal Gallus
Joanna Wozna
Adam Grygielski
Luo Tao
8
3
0
23 Sep 2020
SPINN: Synergistic Progressive Inference of Neural Networks over Device
  and Cloud
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Stefanos Laskaridis
Stylianos I. Venieris
Mario Almeida
Ilias Leontiadis
Nicholas D. Lane
28
266
0
14 Aug 2020
Optimizing Grouped Convolutions on Edge Devices
Optimizing Grouped Convolutions on Edge Devices
Perry Gibson
José Cano
Jack Turner
Elliot J. Crowley
Michael F. P. O'Boyle
Amos Storkey
24
25
0
17 Jun 2020
Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale
  HPC Production Systems with Containers
Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale HPC Production Systems with Containers
D. Brayford
S. Vallecorsa
14
8
0
20 May 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
  Architectures
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
Sudarshan Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
56
48
0
10 May 2020
The Parallelism Motifs of Genomic Data Analysis
The Parallelism Motifs of Genomic Data Analysis
Katherine Yelick
A. Buluç
M. Awan
A. Azad
Benjamin Brock
...
Giulia Guidi
S. Hofmeyr
Oguz Selvitopi
Cristina Teodoropol
L. Oliker
19
17
0
20 Jan 2020
High Performance Depthwise and Pointwise Convolutions on Mobile Devices
High Performance Depthwise and Pointwise Convolutions on Mobile Devices
Pengfei Zhang
Eric Lo
Baotong Lu
19
34
0
03 Jan 2020
SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on
  General-Purpose SIMD Processors
SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors
Zhangxiaowen Gong
Houxiang Ji
Christopher W. Fletcher
C. Hughes
Josep Torrellas
36
5
0
22 Nov 2019
The Indirect Convolution Algorithm
The Indirect Convolution Algorithm
Marat Dukhan
19
42
0
03 Jul 2019
High-Performance Deep Learning via a Single Building Block
High-Performance Deep Learning via a Single Building Block
E. Georganas
K. Banerjee
Dhiraj D. Kalamkar
Sasikanth Avancha
Anand Venkat
Michael J. Anderson
G. Henry
Hans Pabst
A. Heinecke
26
12
0
15 Jun 2019
A Study of BFLOAT16 for Deep Learning Training
A Study of BFLOAT16 for Deep Learning Training
Dhiraj D. Kalamkar
Dheevatsa Mudigere
Naveen Mellempudi
Dipankar Das
K. Banerjee
...
Sudarshan Srinivasan
Abhisek Kundu
M. Smelyanskiy
Bharat Kaul
Pradeep Dubey
MQ
30
338
0
29 May 2019
Distilling with Performance Enhanced Students
Distilling with Performance Enhanced Students
Jack Turner
Elliot J. Crowley
Valentin Radu
José Cano
Amos Storkey
Michael F. P. O'Boyle
24
3
0
24 Oct 2018
ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Matthew Sotoudeh
Anand Venkat
Michael J. Anderson
E. Georganas
A. Heinecke
Jason Knight
19
9
0
12 Oct 2018
Optimizing CNN Model Inference on CPUs
Optimizing CNN Model Inference on CPUs
Yizhi Liu
Yao Wang
Ruofei Yu
Mu Li
Vin Sharma
Yida Wang
12
152
0
07 Sep 2018
1