Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.13342
Cited By
DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion
30 August 2021
Wei Niu
Jiexiong Guan
Yanzhi Wang
G. Agrawal
Bin Ren
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion"
38 / 38 papers shown
Title
Blockbuster, Part 1: Block-level AI Operator Fusion
Ofer Dekel
21
0
0
29 Apr 2025
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
72
0
0
20 Nov 2024
Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization
Yangjie Zhou
Honglin Zhu
Qian Qiu
Weihao Cui
Zihan Liu
...
Jintao Meng
Haidong Lan
Jingwen Leng
Wenxi Zhu
Minwen Deng
36
0
0
02 Sep 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
73
8
0
29 Jul 2024
Composing Distributed Computations Through Task and Kernel Fusion
Rohan Yadav
S. Sundram
Wonchan Lee
Michael Garland
Michael Bauer
Alex Aiken
Fredrik Kjolstad
41
1
0
26 Jun 2024
Accelerating Depthwise Separable Convolutions on Ultra-Low-Power Devices
Francesco Daghero
Alessio Burrello
M. Poncino
Enrico Macii
Daniele Jahier Pagliari
BDL
33
0
0
18 Jun 2024
Optimal Kernel Orchestration for Tensor Programs with Korch
Muyan Hu
Ashwin Venkatram
Shreyashri Biswas
Balamurugan Marimuthu
Bohan Hou
Gabriele Oliaro
Haojie Wang
Liyan Zheng
Xupeng Miao
Jidong Zhai
131
4
0
13 Jun 2024
Survey for Landing Generative AI in Social and E-commerce Recsys -- the Industry Perspectives
Da Xu
Danqing Zhang
Guangyu Yang
Bo Yang
Shuyuan Xu
Lingling Zheng
Cindy Liang
32
2
0
10 Jun 2024
Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls
Sicong Liu
Wentao Zhou
Zimu Zhou
Bin Guo
Minfan Wang
Cheng Fang
Zheng Lin
Zhiwen Yu
29
1
0
03 May 2024
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
Wei Niu
Md. Musfiqur Rahman Sanim
Zhihao Shu
Jiexiong Guan
Xipeng Shen
Miao Yin
Gagan Agrawal
Bin Ren
30
6
0
21 Apr 2024
GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU
Zhongming Yu
Genghan Zhang
Hanxian Huang
Xin Chen
Jishen Zhao
GNN
29
0
0
03 Apr 2024
SoD
2
^2
2
: Statically Optimizing Dynamic Deep Neural Network
Wei Niu
Gagan Agrawal
Bin Ren
33
4
0
29 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey
Sicong Liu
Bin Guo
Cheng Fang
Ziqi Wang
Shiyan Luo
Zimu Zhou
Zhiwen Yu
AI4CE
34
22
0
27 Sep 2023
Grassroots Operator Search for Model Edge Adaptation
Hadjer Benmeziane
Kaoutar El Maghraoui
Hamza Ouarnoughi
Smail Niar
24
0
0
20 Sep 2023
Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges
Fei Dou
Jin Ye
Geng Yuan
Qin Lu
Wei Niu
...
Hongyue Sun
Yunli Shao
Changying Li
Tianming Liu
Wenzhan Song
AI4CE
37
29
0
14 Sep 2023
Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips
Ismet Dagli
M. E. Belviranli
27
8
0
10 Aug 2023
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
Zixuan Ma
Haojie Wang
Jingze Xing
Liyan Zheng
Chen Zhang
Huanqi Cao
Kezhao Huang
Shizhi Tang
Penghan Wang
Jidong Zhai
GNN
34
1
0
11 Jul 2023
ModelObfuscator: Obfuscating Model Information to Protect Deployed ML-based Systems
Mingyi Zhou
Xiang Gao
Jing Wu
John C. Grundy
Xiao Chen
Chunyang Chen
Li Li
AAML
33
12
0
01 Jun 2023
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
Zixuan Jiang
Jiaqi Gu
Hanqing Zhu
David Z. Pan
AI4CE
30
16
0
24 May 2023
TorchBench: Benchmarking PyTorch with High API Surface Coverage
Yueming Hao
Xu Zhao
Bin Bao
David Berard
William Constable
Adnan Aziz
Xu Liu
30
5
0
27 Apr 2023
RAF: Holistic Compilation for Deep Learning Model Training
Cody Hao Yu
Haozheng Fan
Guangtai Huang
Zhen Jia
Yizhi Liu
...
Yuan Zhou
Haichen Shen
Junru Shao
Mu Li
Yida Wang
15
3
0
08 Mar 2023
Xenos: Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices
Runhua Zhang
Hongxu Jiang
Fangzheng Tian
Jinkun Geng
Xiaobin Li
Yuhang Ma
Chenhui Zhu
Dong Dong
Li Xin
Haojie Wang
9
4
0
01 Feb 2023
Operator Fusion in XLA: Analysis and Evaluation
Danielle Snider
Ruofan Liang
24
4
0
30 Jan 2023
AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization
Zhiying Xu
H. Peng
Wei Wang
GNN
26
3
0
02 Dec 2022
ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs
Guyue Huang
Yang Bai
L. Liu
Yuke Wang
Bei Yu
Yufei Ding
Yuan Xie
52
16
0
29 Oct 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
29
8
0
24 Oct 2022
ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
Zhiying Xu
Jiafan Xu
H. Peng
Wei Wang
Xiaoliang Wang
...
Haipeng Dai
Yixu Xu
Hao Cheng
Kun Wang
Guihai Chen
20
0
0
22 Oct 2022
Inference Latency Prediction at the Edge
Zhuojin Li
Marco Paolieri
L. Golubchik
27
3
0
06 Oct 2022
DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Daochen Zha
Louis Feng
Qiaoyu Tan
Zirui Liu
Kwei-Herng Lai
Bhargav Bhushanam
Yuandong Tian
A. Kejariwal
Xia Hu
LMTD
OffRL
30
28
0
05 Oct 2022
Efficient Adaptive Activation Rounding for Post-Training Quantization
Zhengyi Li
Cong Guo
Zhanda Zhu
Yangjie Zhou
Yuxian Qiu
Xiaotian Gao
Jingwen Leng
Minyi Guo
MQ
30
3
0
25 Aug 2022
OLLIE: Derivation-based Tensor Program Optimizer
Liyan Zheng
Haojie Wang
Jidong Zhai
Muyan Hu
Zixuan Ma
Tuowei Wang
Shizhi Tang
Lei Xie
Kezhao Huang
Zhihao Jia
46
3
0
02 Aug 2022
CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework
Xiaofeng Li
Bin Ren
Xipeng Shen
Yanzhi Wang
GNN
25
0
0
21 Jun 2022
A Survey of Multi-Tenant Deep Learning Inference on GPU
Fuxun Yu
Di Wang
Longfei Shangguan
Minjia Zhang
Chenchen Liu
Xiang Chen
BDL
AI4CE
26
32
0
17 Mar 2022
DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators
Sheng-Chun Kao
Xiaoyu Huang
T. Krishna
AI4CE
35
9
0
26 Jan 2022
Collage: Seamless Integration of Deep Learning Backends with Automatic Placement
Byungsoo Jeon
Sunghyun Park
Peiyuan Liao
Sheng Xu
Tianqi Chen
Zhihao Jia
VLM
36
4
0
01 Nov 2021
Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective
Hengrui Zhang
Zhongming Yu
Guohao Dai
Guyue Huang
Yufei Ding
Yuan Xie
Yu Wang
GNN
22
46
0
18 Oct 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
38
57
0
13 Jul 2021
1