Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

16 August 2018

Papers citing "Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures"

28 / 28 papers shown

Title
High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures Xiang Fu Xinpeng Zhang Jixiang Ma Peng Zhao Shuai-bing Lu Xu T. Liu 3DV 41 0 0 01 Aug 2024
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation Lucas Alvarenga Victor Ferrari Rafael Souza M. Pereira Guido Araujo 21 0 0 15 Jul 2024
LookupFFN: Making Transformers Compute-lite for CPU inference Zhanpeng Zeng Michael Davies Pranav Pulijala Karthikeyan Sankaralingam Vikas Singh 38 5 0 12 Mar 2024
MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor Zheng-Kuo Wu 28 1 0 11 Jul 2023
Im2win: Memory Efficient Convolution On SIMD Architectures Shuai-bing Lu Jun Chu Xuantong Liu 31 4 0 25 Jun 2023
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures E. Georganas Dhiraj D. Kalamkar K. Voronin Abhisek Kundu Antonio Noack Hans Pabst Alexander Breuer A. Heinecke 16 2 0 25 Apr 2023
Kernel-Segregated Transpose Convolution Operation Vijay Srinivas Tida Sai Venkatesh Chilukoti X. Hei Sonya Hsu ViT 28 2 0 08 Sep 2022
AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs Chendi Li Haipeng Jia Hang Cao Jianyu Yao Boqian Shi Chunyang Xiang Jinbo Sun Pengqi Lu Yunquan Zhang 22 7 0 17 Aug 2022
Towards Transmission-Friendly and Robust CNN Models over Cloud and Device Chuntao Ding Zhichao Lu F. Xu Vishnu Boddeti Yidong Li Jiannong Cao 27 14 0 20 Jul 2022
Towards Effective Depthwise Convolutions on ARMv8 Architecture Ruochen Hao Qinglin Wang Shangfei Yin Tianyang Zhou Siqi Shen Songzhu Mei Jie Liu MQ MDE 14 1 0 24 Jun 2022
Fast matrix multiplication for binary and ternary CNNs on ARM CPU A. Trusov E. Limonova D. Nikolaev V. Arlazarov MQ 27 5 0 18 May 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators Lois Orosa Skanda Koppula Yaman Umuroglu Konstantinos Kanellopoulos Juan Gómez Luna Michaela Blott K. Vissers O. Mutlu 48 4 0 04 Feb 2022
Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators Yangjie Zhou Mengtian Yang Cong Guo Jingwen Leng Yun Liang Quan Chen Minyi Guo Yuhao Zhu 34 34 0 08 Oct 2021
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads E. Georganas Dhiraj D. Kalamkar Sasikanth Avancha Menachem Adelman Deepti Aggarwal ... Ramanarayan Mohanty Hans Pabst Brian Retford Barukh Ziv A. Heinecke 42 17 0 12 Apr 2021
Applying the Roofline model for Deep Learning performance optimizations Jacek Czaja Michal Gallus Joanna Wozna Adam Grygielski Luo Tao 8 3 0 23 Sep 2020
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud Stefanos Laskaridis Stylianos I. Venieris Mario Almeida Ilias Leontiadis Nicholas D. Lane 28 266 0 14 Aug 2020
Optimizing Grouped Convolutions on Edge Devices Perry Gibson José Cano Jack Turner Elliot J. Crowley Michael F. P. O'Boyle Amos Storkey 24 25 0 17 Jun 2020
Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale HPC Production Systems with Containers D. Brayford S. Vallecorsa 14 8 0 20 May 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures Dhiraj D. Kalamkar E. Georganas Sudarshan Srinivasan Jianping Chen Mikhail Shiryaev A. Heinecke 56 48 0 10 May 2020
The Parallelism Motifs of Genomic Data Analysis Katherine Yelick A. Buluç M. Awan A. Azad Benjamin Brock ... Giulia Guidi S. Hofmeyr Oguz Selvitopi Cristina Teodoropol L. Oliker 19 17 0 20 Jan 2020
High Performance Depthwise and Pointwise Convolutions on Mobile Devices Pengfei Zhang Eric Lo Baotong Lu 19 34 0 03 Jan 2020
SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors Zhangxiaowen Gong Houxiang Ji Christopher W. Fletcher C. Hughes Josep Torrellas 36 5 0 22 Nov 2019
The Indirect Convolution Algorithm Marat Dukhan 19 42 0 03 Jul 2019
High-Performance Deep Learning via a Single Building Block E. Georganas K. Banerjee Dhiraj D. Kalamkar Sasikanth Avancha Anand Venkat Michael J. Anderson G. Henry Hans Pabst A. Heinecke 26 12 0 15 Jun 2019
A Study of BFLOAT16 for Deep Learning Training Dhiraj D. Kalamkar Dheevatsa Mudigere Naveen Mellempudi Dipankar Das K. Banerjee ... Sudarshan Srinivasan Abhisek Kundu M. Smelyanskiy Bharat Kaul Pradeep Dubey MQ 30 338 0 29 May 2019
Distilling with Performance Enhanced Students Jack Turner Elliot J. Crowley Valentin Radu José Cano Amos Storkey Michael F. P. O'Boyle 24 3 0 24 Oct 2018
ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler Matthew Sotoudeh Anand Venkat Michael J. Anderson E. Georganas A. Heinecke Jason Knight 19 9 0 12 Oct 2018
Optimizing CNN Model Inference on CPUs Yizhi Liu Yao Wang Ruofei Yu Mu Li Vin Sharma Yida Wang 12 152 0 07 Sep 2018