ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.02697
  4. Cited By
Optimizing CNN Model Inference on CPUs
v1v2v3 (latest)

Optimizing CNN Model Inference on CPUs

7 September 2018
Yizhi Liu
Yao Wang
Ruofei Yu
Mu Li
Vin Sharma
Yida Wang
ArXiv (abs)PDFHTML

Papers citing "Optimizing CNN Model Inference on CPUs"

47 / 47 papers shown
Title
Compiler Optimization via LLM Reasoning for Efficient Model Serving
Compiler Optimization via LLM Reasoning for Efficient Model Serving
Sujun Tang
Christopher Priebe
R. Mahapatra
Lianhui Qin
H. Esmaeilzadeh
LRM
72
0
0
02 Jun 2025
CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting
  Multi-bank Memories
CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories
Man Shi
Steven Colleman
Charlotte VanDeMieroop
Antony Joseph
M. Meijer
W. Dehaene
Marian Verhelst
34
4
0
14 Jun 2024
SmartMem: Layout Transformation Elimination and Adaptation for Efficient
  DNN Execution on Mobile
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
Wei Niu
Md. Musfiqur Rahman Sanim
Zhihao Shu
Jiexiong Guan
Xipeng Shen
Miao Yin
Gagan Agrawal
Bin Ren
69
6
0
21 Apr 2024
LookupFFN: Making Transformers Compute-lite for CPU inference
LookupFFN: Making Transformers Compute-lite for CPU inference
Zhanpeng Zeng
Michael Davies
Pranav Pulijala
Karthikeyan Sankaralingam
Vikas Singh
71
6
0
12 Mar 2024
JITSPMM: Just-in-Time Instruction Generation for Accelerated Sparse
  Matrix-Matrix Multiplication
JITSPMM: Just-in-Time Instruction Generation for Accelerated Sparse Matrix-Matrix Multiplication
Qiang Fu
Thomas B. Rolinger
H. H. Huang
71
5
0
09 Dec 2023
SySMOL: Co-designing Algorithms and Hardware for Neural Networks with
  Heterogeneous Precisions
SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions
Cyrus Zhou
Pedro H. P. Savarese
Vaughn Richard
Zack Hassman
Xin Yuan
Michael Maire
Michael DiBrino
Yanjing Li
MQ
53
0
0
23 Nov 2023
YFlows: Systematic Dataflow Exploration and Code Generation for
  Efficient Neural Network Inference using SIMD Architectures on CPUs
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs
Cyrus Zhou
Zack Hassman
Ruize Xu
Dhirpal Shah
Vaughn Richard
Yanjing Li
97
2
0
01 Oct 2023
Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively
  Utilizing CPUs and GPUs
Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs
Zinuo Cai
Hao Wang
Tao Song
Yang Hua
Ruhui Ma
Haibing Guan
GNN
33
0
0
21 Jul 2023
KAPLA: Pragmatic Representation and Fast Solving of Scalable NN
  Accelerator Dataflow
KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow
Zhiyao Li
Mingyu Gao
47
1
0
09 Jun 2023
OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for
  Experimental Science
OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for Experimental Science
Maksim Levental
A. Khan
Kyle Chard
Kazutomo Yoshi
Ryan Chard
Ian Foster
49
4
0
13 Feb 2023
Improving Inference Performance of Machine Learning with the
  Divide-and-Conquer Principle
Improving Inference Performance of Machine Learning with the Divide-and-Conquer Principle
Alex Kogan
LRM
124
0
0
12 Jan 2023
Accelerating CNN inference on long vector architectures via co-design
Accelerating CNN inference on long vector architectures via co-design
Sonia Rani Gupta
Nikela Papadopoulou
Miquel Pericàs
3DV
78
4
0
22 Dec 2022
ALT: Boosting Deep Learning Performance by Breaking the Wall between
  Graph and Operator Level Optimizations
ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
Zhiying Xu
Jiafan Xu
H. Peng
Wei Wang
Xiaoliang Wang
...
Haipeng Dai
Yixu Xu
Hao Cheng
Kun Wang
Guihai Chen
98
0
0
22 Oct 2022
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor
  Programs
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs
Yaoyao Ding
Cody Hao Yu
Bojian Zheng
Yizhi Liu
Yida Wang
Gennady Pekhimenko
91
32
0
18 Oct 2022
Decompiling x86 Deep Neural Network Executables
Decompiling x86 Deep Neural Network Executables
Zhibo Liu
Yuanyuan Yuan
Shuai Wang
Xiaofei Xie
Lei Ma
AAML
94
15
0
03 Oct 2022
Going Further With Winograd Convolutions: Tap-Wise Quantization for
  Efficient Inference on 4x4 Tile
Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile
Renzo Andri
Beatrice Bussolino
A. Cipolletta
Lukas Cavigelli
Zhe Wang
MQ
60
14
0
26 Sep 2022
Tensor Program Optimization with Probabilistic Programs
Tensor Program Optimization with Probabilistic Programs
Junru Shao
Xiyou Zhou
Siyuan Feng
Bohan Hou
Ruihang Lai
Hongyi Jin
Wuwei Lin
Masahiro Masuda
Cody Hao Yu
Tianqi Chen
99
31
0
26 May 2022
Multi-Component Optimization and Efficient Deployment of Neural-Networks
  on Resource-Constrained IoT Hardware
Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware
B. Sudharsan
Dineshkumar Sundaram
Pankesh Patel
J. Breslin
M. Ali
Schahram Dustdar
Albert Zomaya
R. Ranjan
36
2
0
20 Apr 2022
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning
  Programs
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
Taebum Kim
Eunji Jeong
Geonyong Kim
Yunmo Koo
Sehoon Kim
Gyeong-In Yu
Byung-Gon Chun
AI4CE
65
5
0
23 Jan 2022
A Highly Configurable Hardware/Software Stack for DNN Inference
  Acceleration
A Highly Configurable Hardware/Software Stack for DNN Inference Acceleration
Suvadeep Banerjee
Steve Burns
P. Cocchini
A. Davare
Shweta Jain
D. Kirkpatrick
A. Sorokin
Jin Yang
Zhenkun Yang
79
9
0
29 Nov 2021
SoftNeuro: Fast Deep Inference using Multi-platform Optimization
SoftNeuro: Fast Deep Inference using Multi-platform Optimization
Masaki Hilaga
Yasuhiro Kuroda
Hitoshi Matsuo
Tatsuya Kawaguchi
Gabriel Ogawa
Hiroshi Miyake
Yusuke Kozawa
43
1
0
12 Oct 2021
Does Form Follow Function? An Empirical Exploration of the Impact of
  Deep Neural Network Architecture Design on Hardware-Specific Acceleration
Does Form Follow Function? An Empirical Exploration of the Impact of Deep Neural Network Architecture Design on Hardware-Specific Acceleration
Saad Abbasi
M. Shafiee
Ellick Chan
Alexander Wong
34
0
0
08 Jul 2021
Bring Your Own Codegen to Deep Learning Compiler
Bring Your Own Codegen to Deep Learning Compiler
Zhi Chen
Cody Hao Yu
Trevor Morris
Jorn Tuyls
Yi-Hsiang Lai
Jared Roesch
Elliott Delaye
Vin Sharma
Yida Wang
55
15
0
03 May 2021
Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks
Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks
Yao Wang
Xingyu Zhou
Yanming Wang
Rui Li
Yong Wu
Vin Sharma
74
8
0
29 Apr 2021
Joint Program and Layout Transformations to enable Convolutional
  Operators on Specialized Hardware based on Constraint Programming
Joint Program and Layout Transformations to enable Convolutional Operators on Specialized Hardware based on Constraint Programming
D. Rieber
Axel Acosta
Holger Fröning
46
0
0
10 Apr 2021
Optimizing Inference Performance of Transformers on CPUs
Optimizing Inference Performance of Transformers on CPUs
D. Dice
Alex Kogan
64
16
0
12 Feb 2021
UNIT: Unifying Tensorized Instruction Compilation
UNIT: Unifying Tensorized Instruction Compilation
Jian Weng
Animesh Jain
Jie Wang
Leyuan Wang
Yida Wang
Tony Nowatzki
373
32
0
21 Jan 2021
SparseDNN: Fast Sparse Deep Learning Inference on CPUs
SparseDNN: Fast Sparse Deep Learning Inference on CPUs
Ziheng Wang
MQ
196
20
0
20 Jan 2021
Hardware and Software Optimizations for Accelerating Deep Neural
  Networks: Survey of Current Trends, Challenges, and the Road Ahead
Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead
Maurizio Capra
Beatrice Bussolino
Alberto Marchisio
Guido Masera
Maurizio Martina
Mohamed Bennai
BDL
129
147
0
21 Dec 2020
Cortex: A Compiler for Recursive Deep Learning Models
Cortex: A Compiler for Recursive Deep Learning Models
Pratik Fegade
Tianqi Chen
Phillip B. Gibbons
T. Mowry
VLM
62
28
0
02 Nov 2020
FeatGraph: A Flexible and Efficient Backend for Graph Neural Network
  Systems
FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems
Yuwei Hu
Zihao Ye
Minjie Wang
Jiali Yu
Da Zheng
Mu Li
Zheng Zhang
Zhiru Zhang
Yida Wang
GNN
114
81
0
26 Aug 2020
SPINN: Synergistic Progressive Inference of Neural Networks over Device
  and Cloud
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Stefanos Laskaridis
Stylianos I. Venieris
Mario Almeida
Ilias Leontiadis
Nicholas D. Lane
102
275
0
14 Aug 2020
Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware
  Multifaceted Optimizations
Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations
Yongchao Liu
Yue Jin
Yongqi Chen
Teng Teng
Hang Ou
Rui Zhao
Yao Zhang
108
1
0
11 Aug 2020
Spatial Sharing of GPU for Autotuning DNN models
Spatial Sharing of GPU for Autotuning DNN models
Aditya Dhakal
Junguk Cho
Sameer G. Kulkarni
K. Ramakrishnan
P. Sharma
75
8
0
08 Aug 2020
Efficient Execution of Quantized Deep Learning Models: A Compiler
  Approach
Efficient Execution of Quantized Deep Learning Models: A Compiler Approach
Animesh Jain
Shoubhik Bhattacharya
Masahiro Masuda
Vin Sharma
Yida Wang
MQ
92
34
0
18 Jun 2020
Ansor: Generating High-Performance Tensor Programs for Deep Learning
Ansor: Generating High-Performance Tensor Programs for Deep Learning
Lianmin Zheng
Chengfan Jia
Minmin Sun
Zhao Wu
Cody Hao Yu
...
Jun Yang
Danyang Zhuo
Koushik Sen
Joseph E. Gonzalez
Ion Stoica
149
402
0
11 Jun 2020
Nimble: Efficiently Compiling Dynamic Neural Networks for Model
  Inference
Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference
Haichen Shen
Jared Roesch
Zhi Chen
Wei Chen
Yong Wu
Mu Li
Vin Sharma
Zachary Tatlock
Yida Wang
59
57
0
04 Jun 2020
AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference
  under Stochastic Variance
AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference under Stochastic Variance
Young Geun Kim
Carole-Jean Wu
35
3
0
06 May 2020
Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural
  Networks for Edge Devices
Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices
Byung Hoon Ahn
Jinwon Lee
J. Lin
Hsin-Pai Cheng
Jilei Hou
H. Esmaeilzadeh
115
55
0
04 Mar 2020
Optimizing Memory-Access Patterns for Deep Learning Accelerators
Optimizing Memory-Access Patterns for Deep Learning Accelerators
Hongbin Zheng
Sejong Oh
Huiqing Wang
Preston Briggs
J. Gai
Animesh Jain
Yizhi Liu
Rich Heaton
Randy Huang
Yida Wang
47
10
0
27 Feb 2020
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network
  Compilation
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation
Byung Hoon Ahn
Prannoy Pilligundla
Amir Yazdanbakhsh
H. Esmaeilzadeh
ODL
113
82
0
23 Jan 2020
Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for
  CNN Models
Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models
Matthew LeMay
Shijian Li
Tian Guo
24
25
0
05 Dec 2019
A Unified Optimization Approach for CNN Model Inference on Integrated
  GPUs
A Unified Optimization Approach for CNN Model Inference on Integrated GPUs
Leyuan Wang
Zhi Chen
Yizhi Liu
Yao Wang
Lianmin Zheng
Mu Li
Yida Wang
88
30
0
03 Jul 2019
High-Performance Deep Learning via a Single Building Block
High-Performance Deep Learning via a Single Building Block
E. Georganas
K. Banerjee
Dhiraj D. Kalamkar
Sasikanth Avancha
Anand Venkat
Michael J. Anderson
G. Henry
Hans Pabst
A. Heinecke
43
12
0
15 Jun 2019
Stripe: Tensor Compilation via the Nested Polyhedral Model
Stripe: Tensor Compilation via the Nested Polyhedral Model
Tim Zerrell
J. Bruestle
50
32
0
14 Mar 2019
High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core
  Processors
High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors
Siqi Wang
Gayathri Ananthanarayanan
Yifan Zeng
Neeraj Goel
A. Pathania
T. Mitra
133
124
0
14 Mar 2019
DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on
  FPGA-based CNN Accelerators
DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators
Yu Xing
Shuang Liang
Lingzhi Sui
Xijie Jia
Jiantao Qiu
Xin Liu
Yushun Wang
Yu Wang
Yi Shan
78
71
0
20 Feb 2019
1