ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.06762
  4. Cited By
Ansor: Generating High-Performance Tensor Programs for Deep Learning

Ansor: Generating High-Performance Tensor Programs for Deep Learning

11 June 2020
Lianmin Zheng
Chengfan Jia
Minmin Sun
Zhao Wu
Cody Hao Yu
Ameer Haj-Ali
Yida Wang
Jun Yang
Danyang Zhuo
Koushik Sen
Joseph E. Gonzalez
Ion Stoica
ArXivPDFHTML

Papers citing "Ansor: Generating High-Performance Tensor Programs for Deep Learning"

50 / 53 papers shown
Title
QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives
QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives
X. Zhang
Shaohui Peng
Qirui Zhou
Yuanbo Wen
Qi Guo
...
Ke Gao
Chen Zhao
Yanjun Wu
Yunji Chen
Ling Li
VLM
39
0
0
08 May 2025
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
Zihan Wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
148
0
0
02 May 2025
Blockbuster, Part 1: Block-level AI Operator Fusion
Blockbuster, Part 1: Block-level AI Operator Fusion
Ofer Dekel
21
0
0
29 Apr 2025
TileLang: A Composable Tiled Programming Model for AI Systems
TileLang: A Composable Tiled Programming Model for AI Systems
Lei Wang
Yu Cheng
Yining Shi
Zhengju Tang
Zhiwen Mo
...
Lingxiao Ma
Yuqing Xia
Jilong Xue
Fan Yang
Zhiyong Yang
68
1
0
24 Apr 2025
Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis
Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis
Xinsong Zhang
Yaoyao Ding
Yang Hu
Gennady Pekhimenko
49
0
0
22 Apr 2025
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Yaoyao Ding
Bohan Hou
X. Zhang
Allan Lin
Tianqi Chen
Cody Yu Hao
Yida Wang
Gennady Pekhimenko
50
0
0
17 Apr 2025
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms
Feiyang Chen
Yu Cheng
Lei Wang
Yuqing Xia
Ziming Miao
...
Fan Yang
Jinbao Xue
Zhi Yang
M. Yang
H. Chen
81
1
0
24 Feb 2025
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards
Basar Kutukcu
Sinan Xie
Sabur Baidya
Sujit Dey
42
0
0
16 Feb 2025
Data-efficient Performance Modeling via Pre-training
Data-efficient Performance Modeling via Pre-training
Chunting Liu
Riyadh Baghdadi
43
0
0
24 Jan 2025
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
Yuanchang Zhou
Siyu Hu
Chen Wang
Lin-Wang Wang
Guangming Tan
Weile Jia
AI4CE
GNN
56
0
0
30 Dec 2024
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
Mufei Li
Viraj Shitole
Eli Chien
Changhai Man
Zhaodong Wang
Srinivas Sridharan
Ying Zhang
Tushar Krishna
P. Li
41
0
0
04 Nov 2024
Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning
  Kernel Schedulers with Coordinate Descent
Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning Kernel Schedulers with Coordinate Descent
Michael Canesche
Gaurav Verma
Fernando Magno Quintao Pereira
21
1
0
28 Jun 2024
Scorch: A Library for Sparse Deep Learning
Scorch: A Library for Sparse Deep Learning
Bobby Yan
Alexander J. Root
Trevor Gale
David Broman
Fredrik Kjolstad
33
0
0
27 May 2024
Allo: A Programming Model for Composable Accelerator Design
Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen
Niansong Zhang
Shaojie Xiang
Zhichen Zeng
Mengjia Dai
Zhiru Zhang
54
14
0
07 Apr 2024
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers
Massinissa Merouani
Khaled Afif Boudaoud
Iheb Nassim Aouadj
Nassim Tchoulak
Islam Kara Bernou
Hamza Benyamina
F. B. Tayeb
K. Benatchba
Hugh Leather
Riyadh Baghdadi
45
2
0
18 Mar 2024
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Ruihang Lai
Junru Shao
Siyuan Feng
Steven Lyubomirsky
Bohan Hou
...
Sunghyun Park
Prakalp Srivastava
Jared Roesch
T. Mowry
Tianqi Chen
47
9
0
01 Nov 2023
Target-independent XLA optimization using Reinforcement Learning
Target-independent XLA optimization using Reinforcement Learning
Milan Ganai
Haichen Li
Theodore Enns
Yida Wang
Randy Huang
39
0
0
28 Aug 2023
PowerFusion: A Tensor Compiler with Explicit Data Movement Description
  and Instruction-level Graph IR
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
Zixuan Ma
Haojie Wang
Jingze Xing
Liyan Zheng
Chen Zhang
Huanqi Cao
Kezhao Huang
Shizhi Tang
Penghan Wang
Jidong Zhai
GNN
34
1
0
11 Jul 2023
SpecInfer: Accelerating Generative Large Language Model Serving with
  Tree-based Speculative Inference and Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
65
120
0
16 May 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
101
0
27 Feb 2023
Operator Fusion in XLA: Analysis and Evaluation
Operator Fusion in XLA: Analysis and Evaluation
Danielle Snider
Ruofan Liang
24
4
0
30 Jan 2023
AGO: Boosting Mobile AI Inference Performance by Removing Constraints on
  Graph Optimization
AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization
Zhiying Xu
H. Peng
Wei Wang
GNN
26
3
0
02 Dec 2022
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler
  for Neural Networks
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks
Zining Zhang
Bingsheng He
Zhenjie Zhang
14
5
0
21 Nov 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
ParCNetV2: Oversized Kernel with Enhanced Attention
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
32
6
0
14 Nov 2022
AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse
  Matrices
AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices
Zhen Du
Jiajia Li
Yinshan Wang
Xueqi Li
Guangming Tan
N. Sun
24
21
0
07 Nov 2022
ALT: Boosting Deep Learning Performance by Breaking the Wall between
  Graph and Operator Level Optimizations
ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
Zhiying Xu
Jiafan Xu
H. Peng
Wei Wang
Xiaoliang Wang
...
Haipeng Dai
Yixu Xu
Hao Cheng
Kun Wang
Guihai Chen
35
0
0
22 Oct 2022
Decompiling x86 Deep Neural Network Executables
Decompiling x86 Deep Neural Network Executables
Zhibo Liu
Yuanyuan Yuan
Shuai Wang
Xiaofei Xie
Lei Ma
AAML
45
13
0
03 Oct 2022
Optimizing DNN Compilation for Distributed Training with Joint OP and
  Tensor Fusion
Optimizing DNN Compilation for Distributed Training with Joint OP and Tensor Fusion
Xiaodong Yi
Shiwei Zhang
Lansong Diao
Chuan Wu
Zhen Zheng
Shiqing Fan
Siyu Wang
Jun Yang
W. Lin
39
4
0
26 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural
  Network Quantization
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
20
55
0
30 Aug 2022
SONAR: Joint Architecture and System Optimization Search
SONAR: Joint Architecture and System Optimization Search
Elias Jääsaari
Michelle Ma
Ameet Talwalkar
Tianqi Chen
38
1
0
25 Aug 2022
OLLIE: Derivation-based Tensor Program Optimizer
OLLIE: Derivation-based Tensor Program Optimizer
Liyan Zheng
Haojie Wang
Jidong Zhai
Muyan Hu
Zixuan Ma
Tuowei Wang
Shizhi Tang
Lei Xie
Kezhao Huang
Zhihao Jia
46
3
0
02 Aug 2022
NNSmith: Generating Diverse and Valid Test Cases for Deep Learning
  Compilers
NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers
Jiawei Liu
Jinkun Lin
Fabian Ruffy
Cheng Tan
Jinyang Li
Aurojit Panda
Lingming Zhang
76
57
0
26 Jul 2022
Productive Reproducible Workflows for DNNs: A Case Study for Industrial
  Defect Detection
Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection
Perry Gibson
José Cano
AI4CE
32
1
0
19 Jun 2022
HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time
  and Robustness
HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time and Robustness
D. Rieber
Moritz Reiber
Oliver Bringmann
Holger Fröning
24
4
0
31 May 2022
Tensor Program Optimization with Probabilistic Programs
Tensor Program Optimization with Probabilistic Programs
Junru Shao
Xiyou Zhou
Siyuan Feng
Bohan Hou
Ruihang Lai
Hongyi Jin
Wuwei Lin
Masahiro Masuda
Cody Hao Yu
Tianqi Chen
37
29
0
26 May 2022
Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN
  Accelerators
Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators
Axel Stjerngren
Perry Gibson
José Cano
34
4
0
26 Apr 2022
Shisha: Online scheduling of CNN pipelines on heterogeneous
  architectures
Shisha: Online scheduling of CNN pipelines on heterogeneous architectures
Pirah Noor Soomro
M. Abduljabbar
J. Castrillón
Miquel Pericàs
24
1
0
23 Feb 2022
Benchmarking of DL Libraries and Models on Mobile Devices
Benchmarking of DL Libraries and Models on Mobile Devices
Qiyang Zhang
Xiang Li
Xiangying Che
Xiao Ma
Ao Zhou
Mengwei Xu
Shangguang Wang
Yun Ma
Xuanzhe Liu
25
48
0
14 Feb 2022
Learning from distinctive candidates to optimize reduced-precision
  convolution program on tensor cores
Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores
Junkyeong Choi
Hyucksung Kwon
W. Lee
Jungwook Choi
Jieun Lim
19
0
0
11 Feb 2022
Moses: Efficient Exploitation of Cross-device Transferable Features for
  Tensor Program Optimization
Moses: Efficient Exploitation of Cross-device Transferable Features for Tensor Program Optimization
Zhihe Zhao
Xian Shuai
Yang Bai
Neiwen Ling
Nan Guan
Zhenyu Yan
Guoliang Xing
28
6
0
15 Jan 2022
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program
  Code Generation
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation
Perry Gibson
José Cano
29
12
0
14 Jan 2022
FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation
  Subgraph Similarity
FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity
Shanjun Zhang
Mingzhen Li
Hailong Yang
Yi Liu
Zhongzhi Luan
D. Qian
26
0
0
01 Jan 2022
Profile Guided Optimization without Profiles: A Machine Learning
  Approach
Profile Guided Optimization without Profiles: A Machine Learning Approach
Nadav Rotem
Chris Cummins
OffRL
25
7
0
24 Dec 2021
Bolt: Bridging the Gap between Auto-tuners and Hardware-native
  Performance
Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
Jiarong Xing
Leyuan Wang
Shang Zhang
Jack H Chen
Ang Chen
Yibo Zhu
33
43
0
25 Oct 2021
CompilerGym: Robust, Performant Compiler Optimization Environments for
  AI Research
CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research
Chris Cummins
Bram Wasti
Jiadong Guo
Brandon Cui
Jason Ansel
...
Jia-Wei Liu
O. Teytaud
Benoit Steiner
Yuandong Tian
Hugh Leather
31
68
0
17 Sep 2021
1xN Pattern for Pruning Convolutional Neural Networks
1xN Pattern for Pruning Convolutional Neural Networks
Mingbao Lin
Yu-xin Zhang
Yuchao Li
Bohong Chen
Rongrong Ji
Mengdi Wang
Shen Li
Yonghong Tian
Rongrong Ji
3DPC
33
40
0
31 May 2021
Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks
Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks
Yao Wang
Xingyu Zhou
Yanming Wang
Rui Li
Yong Wu
Vin Sharma
24
8
0
29 Apr 2021
Tensor Processing Primitives: A Programming Abstraction for Efficiency
  and Portability in Deep Learning & HPC Workloads
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads
E. Georganas
Dhiraj D. Kalamkar
Sasikanth Avancha
Menachem Adelman
Deepti Aggarwal
...
Ramanarayan Mohanty
Hans Pabst
Brian Retford
Barukh Ziv
A. Heinecke
37
17
0
12 Apr 2021
DISC: A Dynamic Shape Compiler for Machine Learning Workloads
DISC: A Dynamic Shape Compiler for Machine Learning Workloads
Kai Zhu
Wenyi Zhao
Zhen Zheng
Tianyou Guo
Pengzhan Zhao
...
Junjie Bai
Jun Yang
Xiaoyong Liu
Lansong Diao
Wei Lin
27
27
0
09 Mar 2021
MetaTune: Meta-Learning Based Cost Model for Fast and Efficient
  Auto-tuning Frameworks
MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks
Jaehun Ryu
Hyojin Sung
57
16
0
08 Feb 2021
12
Next