ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.04799
  4. Cited By
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

12 February 2018
Tianqi Chen
T. Moreau
Ziheng Jiang
Lianmin Zheng
Eddie Q. Yan
M. Cowan
Haichen Shen
Leyuan Wang
Yuwei Hu
Luis Ceze
Carlos Guestrin
Arvind Krishnamurthy
ArXivPDFHTML

Papers citing "TVM: An Automated End-to-End Optimizing Compiler for Deep Learning"

50 / 67 papers shown
Title
Bridging Control-Centric and Data-Centric Optimization
Bridging Control-Centric and Data-Centric Optimization
Tal Ben-Nun
Berke Ates
A. Calotoiu
Torsten Hoefler
31
7
0
01 Jun 2023
AMULET: Adaptive Matrix-Multiplication-Like Tasks
AMULET: Adaptive Matrix-Multiplication-Like Tasks
Junyoung Kim
Kenneth Ross
Eric Sedlar
Lukas Stadler
13
1
0
12 May 2023
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
Shixun Wu
Yujia Zhai
Jinyang Liu
Jiajun Huang
Zizhe Jian
Bryan M. Wong
Zizhong Chen
29
13
0
01 May 2023
Operator Fusion in XLA: Analysis and Evaluation
Operator Fusion in XLA: Analysis and Evaluation
Danielle Snider
Ruofan Liang
24
4
0
30 Jan 2023
A Study on the Intersection of GPU Utilization and CNN Inference
A Study on the Intersection of GPU Utilization and CNN Inference
J. Kosaian
Amar Phanishayee
23
3
0
15 Dec 2022
Demand Layering for Real-Time DNN Inference with Minimized Memory Usage
Demand Layering for Real-Time DNN Inference with Minimized Memory Usage
Min-Zhi Ji
Saehanseul Yi
Chang-Mo Koo
Sol Ahn
Dongjoo Seo
N. Dutt
Jong-Chan Kim
42
16
0
08 Oct 2022
Optimizing DNN Compilation for Distributed Training with Joint OP and
  Tensor Fusion
Optimizing DNN Compilation for Distributed Training with Joint OP and Tensor Fusion
Xiaodong Yi
Shiwei Zhang
Lansong Diao
Chuan Wu
Zhen Zheng
Shiqing Fan
Siyu Wang
Jun Yang
W. Lin
39
4
0
26 Sep 2022
Compiler-Aware Neural Architecture Search for On-Mobile Real-time
  Super-Resolution
Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution
Yushu Wu
Yifan Gong
Pu Zhao
Yanyu Li
Zheng Zhan
Wei Niu
Hao Tang
Minghai Qin
Bin Ren
Yanzhi Wang
SupR
MQ
35
23
0
25 Jul 2022
On Efficient Approximate Queries over Machine Learning Models
On Efficient Approximate Queries over Machine Learning Models
Dujian Ding
S. Amer-Yahia
L. Lakshmanan
24
5
0
06 Jun 2022
Tensor Program Optimization with Probabilistic Programs
Tensor Program Optimization with Probabilistic Programs
Junru Shao
Xiyou Zhou
Siyuan Feng
Bohan Hou
Ruihang Lai
Hongyi Jin
Wuwei Lin
Masahiro Masuda
Cody Hao Yu
Tianqi Chen
37
29
0
26 May 2022
Toward smart composites: small-scale, untethered prediction and control
  for soft sensor/actuator systems
Toward smart composites: small-scale, untethered prediction and control for soft sensor/actuator systems
Sarah Aguasvivas Manzano
V. Sundaram
Artemis Xu
K. Ly
M. Rentschler
R. Shepherd
N. Correll
35
5
0
22 May 2022
Learning to Reverse DNNs from AI Programs Automatically
Learning to Reverse DNNs from AI Programs Automatically
Simin Chen
Hamed Khanpour
Cong Liu
Wei Yang
35
15
0
20 May 2022
Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN
  Accelerators
Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators
Axel Stjerngren
Perry Gibson
José Cano
34
4
0
26 Apr 2022
Multi-Component Optimization and Efficient Deployment of Neural-Networks
  on Resource-Constrained IoT Hardware
Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware
B. Sudharsan
Dineshkumar Sundaram
Pankesh Patel
J. Breslin
M. Ali
Schahram Dustdar
Albert Zomaya
R. Ranjan
18
2
0
20 Apr 2022
Special Session: Towards an Agile Design Methodology for Efficient,
  Reliable, and Secure ML Systems
Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems
Shail Dave
Alberto Marchisio
Muhammad Abdullah Hanif
Amira Guesmi
Aviral Shrivastava
Ihsen Alouani
Muhammad Shafique
34
13
0
18 Apr 2022
PICASSO: Unleashing the Potential of GPU-centric Training for
  Wide-and-deep Recommender Systems
PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems
Yuanxing Zhang
Langshi Chen
Siran Yang
Man Yuan
Hui-juan Yi
...
Yong Li
Dingyang Zhang
Wei Lin
Lin Qu
Bo Zheng
35
32
0
11 Apr 2022
Federated Remote Physiological Measurement with Imperfect Data
Federated Remote Physiological Measurement with Imperfect Data
Xin Liu
Mingchuan Zhang
Ziheng Jiang
Shwetak N. Patel
Daniel J. McDuff
27
12
0
11 Mar 2022
Query Processing on Tensor Computation Runtimes
Query Processing on Tensor Computation Runtimes
Dong He
Supun Nakandala
Dalitso Banda
Rathijit Sen
Karla Saur
Kwanghyun Park
Carlo Curino
Jesús Camacho-Rodríguez
Konstantinos Karanasos
Matteo Interlandi
27
35
0
03 Mar 2022
Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation
Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation
Jiawei Liu
Yuxiang Wei
Sen Yang
Yinlin Deng
Lingming Zhang
33
41
0
21 Feb 2022
Quantune: Post-training Quantization of Convolutional Neural Networks
  using Extreme Gradient Boosting for Fast Deployment
Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment
Jemin Lee
Misun Yu
Yongin Kwon
Teaho Kim
MQ
25
17
0
10 Feb 2022
Moses: Efficient Exploitation of Cross-device Transferable Features for
  Tensor Program Optimization
Moses: Efficient Exploitation of Cross-device Transferable Features for Tensor Program Optimization
Zhihe Zhao
Xian Shuai
Yang Bai
Neiwen Ling
Nan Guan
Zhenyu Yan
Guoliang Xing
28
6
0
15 Jan 2022
Automated Deep Learning: Neural Architecture Search Is Not the End
Automated Deep Learning: Neural Architecture Search Is Not the End
Xuanyi Dong
D. Kedziora
Katarzyna Musial
Bogdan Gabrys
25
26
0
16 Dec 2021
Revisiting Neuron Coverage for DNN Testing: A Layer-Wise and
  Distribution-Aware Criterion
Revisiting Neuron Coverage for DNN Testing: A Layer-Wise and Distribution-Aware Criterion
Yuanyuan Yuan
Qi Pang
Shuai Wang
43
22
0
03 Dec 2021
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
  Mobile Acceleration
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration
Yifan Gong
Geng Yuan
Zheng Zhan
Wei Niu
Zhengang Li
...
Sijia Liu
Bin Ren
Xue Lin
Xulong Tang
Yanzhi Wang
28
10
0
22 Nov 2021
NxMTransformer: Semi-Structured Sparsification for Natural Language
  Understanding via ADMM
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
37
18
0
28 Oct 2021
A TinyML Platform for On-Device Continual Learning with Quantized Latent
  Replays
A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays
Leonardo Ravaglia
Manuele Rusci
D. Nadalini
Alessandro Capotondi
Francesco Conti
Luca Benini
BDL
39
64
0
20 Oct 2021
SoftNeuro: Fast Deep Inference using Multi-platform Optimization
SoftNeuro: Fast Deep Inference using Multi-platform Optimization
Masaki Hilaga
Yasuhiro Kuroda
Hitoshi Matsuo
Tatsuya Kawaguchi
Gabriel Ogawa
Hiroshi Miyake
Yusuke Kozawa
26
1
0
12 Oct 2021
TSM: Temporal Shift Module for Efficient and Scalable Video
  Understanding on Edge Device
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device
Ji Lin
Chuang Gan
Kuan-Chieh Jackson Wang
Song Han
40
64
0
27 Sep 2021
NeurObfuscator: A Full-stack Obfuscation Tool to Mitigate Neural
  Architecture Stealing
NeurObfuscator: A Full-stack Obfuscation Tool to Mitigate Neural Architecture Stealing
Jingtao Li
Zhezhi He
Adnan Siraj Rakin
Deliang Fan
C. Chakrabarti
29
24
0
20 Jul 2021
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Qijing Huang
Minwoo Kang
Grace Dinh
Thomas Norell
Aravind Kalaiah
J. Demmel
J. Wawrzynek
Y. Shao
23
105
0
05 May 2021
Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks
Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks
Yao Wang
Xingyu Zhou
Yanming Wang
Rui Li
Yong Wu
Vin Sharma
24
8
0
29 Apr 2021
Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference
  on GPUs
Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs
J. Kosaian
K. V. Rashmi
38
33
0
19 Apr 2021
Do We Need Anisotropic Graph Neural Networks?
Do We Need Anisotropic Graph Neural Networks?
Shyam A. Tailor
Felix L. Opolka
Pietro Lio
Nicholas D. Lane
46
34
0
03 Apr 2021
Automated Backend-Aware Post-Training Quantization
Automated Backend-Aware Post-Training Quantization
Ziheng Jiang
Animesh Jain
An Liu
Josh Fromm
Chengqian Ma
Tianqi Chen
Luis Ceze
MQ
37
2
0
27 Mar 2021
MetaTune: Meta-Learning Based Cost Model for Fast and Efficient
  Auto-tuning Frameworks
MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks
Jaehun Ryu
Hyojin Sung
57
16
0
08 Feb 2021
Pruning and Quantization for Deep Neural Network Acceleration: A Survey
Pruning and Quantization for Deep Neural Network Acceleration: A Survey
Tailin Liang
C. Glossner
Lei Wang
Shaobo Shi
Xiaotong Zhang
MQ
150
675
0
24 Jan 2021
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
105
341
0
05 Jan 2021
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization
  Framework
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework
Sung-En Chang
Yanyu Li
Mengshu Sun
Runbin Shi
Hayden Kwok-Hay So
Xuehai Qian
Yanzhi Wang
Xue Lin
MQ
26
82
0
08 Dec 2020
Larq Compute Engine: Design, Benchmark, and Deploy State-of-the-Art
  Binarized Neural Networks
Larq Compute Engine: Design, Benchmark, and Deploy State-of-the-Art Binarized Neural Networks
T. Bannink
Arash Bakhtiari
Adam Hillier
Lukas Geiger
T. D. Bruin
Leon Overweel
J. Neeven
K. Helwegen
3DV
MQ
13
36
0
18 Nov 2020
Customizing Trusted AI Accelerators for Efficient Privacy-Preserving
  Machine Learning
Customizing Trusted AI Accelerators for Efficient Privacy-Preserving Machine Learning
Peichen Xie
Xuanle Ren
Guangyu Sun
FedML
12
6
0
12 Nov 2020
Long Document Ranking with Query-Directed Sparse Transformer
Long Document Ranking with Query-Directed Sparse Transformer
Jyun-Yu Jiang
Chenyan Xiong
Chia-Jung Lee
Wei Wang
30
25
0
23 Oct 2020
TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems
TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems
R. David
Jared Duke
Advait Jain
Vijay Janapa Reddi
Nat Jeffries
...
Meghna Natraj
Shlomi Regev
Rocky Rhodes
Tiezhen Wang
Pete Warden
119
466
0
17 Oct 2020
A Tensor Compiler for Unified Machine Learning Prediction Serving
A Tensor Compiler for Unified Machine Learning Prediction Serving
Supun Nakandala Karla Saur
Karla Saur
Gyeong-In Yu
Konstantinos Karanasos
Carlo Curino
Markus Weimer
Matteo Interlandi
24
53
0
09 Oct 2020
HAPI: Hardware-Aware Progressive Inference
HAPI: Hardware-Aware Progressive Inference
Stefanos Laskaridis
Stylianos I. Venieris
Hyeji Kim
Nicholas D. Lane
22
45
0
10 Aug 2020
Spatial Sharing of GPU for Autotuning DNN models
Spatial Sharing of GPU for Autotuning DNN models
Aditya Dhakal
Junguk Cho
Sameer G. Kulkarni
K. Ramakrishnan
P. Sharma
19
8
0
08 Aug 2020
Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
Zijing Gu
15
1
0
26 Jul 2020
MCUNet: Tiny Deep Learning on IoT Devices
MCUNet: Tiny Deep Learning on IoT Devices
Ji Lin
Wei-Ming Chen
Yujun Lin
J. Cohn
Chuang Gan
Song Han
79
475
0
20 Jul 2020
DCAF: A Dynamic Computation Allocation Framework for Online Serving
  System
DCAF: A Dynamic Computation Allocation Framework for Online Serving System
Biye Jiang
Pengye Zhang
Rihan Chen
Binding Dai
Xinchen Luo
Yifan Yang
Guan Wang
Guorui Zhou
Xiaoqiang Zhu
Kun Gai
8
15
0
17 Jun 2020
STONNE: A Detailed Architectural Simulator for Flexible Neural Network
  Accelerators
STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators
Francisco Munoz-Martínez
José L. Abellán
M. Acacio
T. Krishna
27
11
0
10 Jun 2020
GEVO: GPU Code Optimization using Evolutionary Computation
GEVO: GPU Code Optimization using Evolutionary Computation
Jhe-Yu Liou
Xiaodong Wang
Stephanie Forrest
Carole-Jean Wu
30
2
0
17 Apr 2020
12
Next