v1v2v3 (latest)

On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance

1 November 2024

Papers citing "On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance"

50 / 73 papers shown

Title
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance Jaskirat Singh Emad Fallahzadeh Bram Adams Ahmed E. Hassan MQ 121 3 0 25 Mar 2024
Efficient Post-training Quantization with FP8 Formats Haihao Shen Naveen Mellempudi Xin He Q. Gao Chang‐Bao Wang Mengni Wang MQ 58 23 0 26 Sep 2023
Pruning vs Quantization: Which is Better? Andrey Kuzmin Markus Nagel M. V. Baalen Arash Behboodi Tijmen Blankevoort MQ 112 51 0 06 Jul 2023
Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand Edge Resource Xiang Yang Deliang Chen Q. Qi Jingyu Wang Haifeng Sun J. Liao Song Guo 124 3 0 21 Jun 2023
QuaLA-MiniLM: a Quantized Length Adaptive MiniLM Shira Guskin Moshe Wasserblat Chang Wang Haihao Shen MQ 58 2 0 31 Oct 2022
Fast DistilBERT on CPUs Haihao Shen Ofir Zafrir Bo Dong Hengyu Meng Xinyu. Ye Zhe Wang Yi Ding Hanwen Chang Guy Boudoukh Moshe Wasserblat VLM 42 2 0 27 Oct 2022
Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks Rajiv Movva Jinhao Lei Shayne Longpre Ajay K. Gupta Chris DuBois VLM MQ 59 5 0 20 Aug 2022
An Empirical Study of Challenges in Converting Deep Learning Models Moses Openja Amin Nikanjam Ahmed Haj Yahmed Foutse Khomh Zhen Ming Zhengyong Jiang AAML 98 19 0 28 Jun 2022
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training Charbel Sakr Steve Dai Rangharajan Venkatesan B. Zimmer W. Dally Brucek Khailany MQ 64 41 0 13 Jun 2022
OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization Peng Hu Xi Peng Erik Cambria M. Aly Jie Lin MQ 91 61 0 23 May 2022
Overcoming Oscillations in Quantization-Aware Training Markus Nagel Marios Fournarakis Yelysei Bondarenko Tijmen Blankevoort MQ 172 108 0 21 Mar 2022
SC2 Benchmark: Supervised Compression for Split Computing Yoshitomo Matsubara Ruihan Yang Marco Levorato Stephan Mandt 102 20 0 16 Mar 2022
BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing Yoshitomo Matsubara Davide Callegaro Sameer Singh Marco Levorato Francesco Restuccia 58 41 0 07 Jan 2022
Prune Once for All: Sparse Pre-Trained Language Models Ofir Zafrir Ariel Larey Guy Boudoukh Haihao Shen Moshe Wasserblat VLM 58 85 0 10 Nov 2021
Auto-Split: A General Framework of Collaborative Edge-Cloud AI Amin Banitalebi-Dehkordi Naveen Vedula J. Pei Fei Xia Lanjun Wang Yong Zhang 64 92 0 30 Aug 2021
Supervised Compression for Resource-Constrained Edge Computing Systems Yoshitomo Matsubara Ruihan Yang Marco Levorato Stephan Mandt 87 58 0 21 Aug 2021
PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation Jang-Hyun Kim Simyung Chang Nojun Kwak 61 45 0 25 Jun 2021
Post-Training Sparsity-Aware Quantization Gil Shomron F. Gabbay Samer Kurzum U. Weiser MQ 76 34 0 23 May 2021
Single-Training Collaborative Object Detectors Adaptive to Bandwidth and Computation Juliano S. Assine José Cândido Silveira Santos Filho Eduardo Valle ObjD 84 8 0 03 May 2021
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference B. Hawks Javier Mauricio Duarte Nicholas J. Fraser Alessandro Pappalardo N. Tran Yaman Umuroglu MQ 52 51 0 22 Feb 2021
Confounding Tradeoffs for Neural Network Quantization Sahaj Garg Anirudh Jain Joe Lou Mitchell Nahmias MQ 66 18 0 12 Feb 2021
Dynamic Precision Analog Computing for Neural Networks Sahaj Garg Joe Lou Anirudh Jain Mitchell Nahmias 63 33 0 12 Feb 2021
BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction Yuhang Li Ruihao Gong Xu Tan Yang Yang Peng Hu Qi Zhang F. Yu Wei Wang Shi Gu MQ 138 444 0 10 Feb 2021
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization Jing Jin Cai Liang Tiancheng Wu Li Zou Zhiliang Gan MQ 55 27 0 15 Jan 2021
Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search Mingzhu Shen Feng Liang Ruihao Gong Yuhang Li Chuming Li Chen Lin F. Yu Junjie Yan Wanli Ouyang MQ 63 39 0 09 Oct 2020
Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks Yoonho Boo Sungho Shin Jungwook Choi Wonyong Sung MQ 62 30 0 30 Sep 2020
Degree-Quant: Quantization-Aware Training for Graph Neural Networks Shyam A. Tailor Javier Fernandez-Marques Nicholas D. Lane GNN MQ 50 145 0 11 Aug 2020
Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks Yoshitomo Matsubara Marco Levorato 55 54 0 31 Jul 2020
Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming Itay Hubara Yury Nahshan Y. Hanani Ron Banner Daniel Soudry MQ 107 128 0 14 Jun 2020
Knowledge Distillation: A Survey Jianping Gou B. Yu Stephen J. Maybank Dacheng Tao VLM 162 2,986 0 09 Jun 2020
Bayesian Bits: Unifying Quantization and Pruning M. V. Baalen Christos Louizos Markus Nagel Rana Ali Amjad Ying Wang Tijmen Blankevoort Max Welling MQ 79 116 0 14 May 2020
Up or Down? Adaptive Rounding for Post-Training Quantization Markus Nagel Rana Ali Amjad M. V. Baalen Christos Louizos Tijmen Blankevoort MQ 92 586 0 22 Apr 2020
LSQ+: Improving low-bit quantization through learnable offsets and better initialization Yash Bhalgat Jinwon Lee Markus Nagel Tijmen Blankevoort Nojun Kwak MQ 62 222 0 20 Apr 2020
Training with Quantization Noise for Extreme Model Compression Angela Fan Pierre Stock Benjamin Graham Edouard Grave Remi Gribonval Hervé Jégou Armand Joulin MQ 99 246 0 15 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time Weijie Liu Peng Zhou Zhe Zhao Zhiruo Wang Haotang Deng Qi Ju 84 360 0 05 Apr 2020
Understanding and Improving Knowledge Distillation Jiaxi Tang Rakesh Shivanna Zhe Zhao Dong Lin Anima Singh Ed H. Chi Sagar Jain 88 133 0 10 Feb 2020
Post-Training Piecewise Linear Quantization for Deep Neural Networks Jun Fang Ali Shafiee Hamzah Abdel-Aziz D. Thorsley Georgios Georgiadis Joseph Hassoun MQ 71 147 0 31 Jan 2020
ZeroQ: A Novel Zero Shot Quantization Framework Yaohui Cai Z. Yao Zhen Dong A. Gholami Michael W. Mahoney Kurt Keutzer MQ 98 399 0 01 Jan 2020
QKD: Quantization-aware Knowledge Distillation Jangho Kim Yash Bhalgat Jinwon Lee Chirag I. Patel Nojun Kwak MQ 90 66 0 28 Nov 2019
Model Pruning Enables Efficient Federated Learning on Edge Devices Yuang Jiang Shiqiang Wang Victor Valls Bongjun Ko Wei-Han Lee Kin K. Leung Leandros Tassiulas 97 463 0 26 Sep 2019
Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence Shuiguang Deng Hailiang Zhao Weijia Fang Yuxiang Cai Schahram Dustdar Albert Y. Zomaya 100 616 0 02 Sep 2019
Improved Techniques for Training Adaptive Deep Networks Hao Li Hong Zhang Xiaojuan Qi Ruigang Yang Gao Huang 69 132 0 17 Aug 2019
Machine Learning at the Network Edge: A Survey M. G. Sarwar Murshed Chris Murphy Daqing Hou Nazar Khan Ganesh Ananthanarayanan Faraz Hussain 60 384 0 31 Jul 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 686 24,557 0 26 Jul 2019
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey Xiaofei Wang Yiwen Han Victor C. M. Leung Dusit Niyato Xueqiang Yan Xu Chen 83 998 0 19 Jul 2019
Wireless Federated Distillation for Distributed Edge Learning with Heterogeneous Data Jinhyun Ahn Osvaldo Simeone Joonhyuk Kang FedML 53 109 0 05 Jul 2019
Data-Free Quantization Through Weight Equalization and Bias Correction Markus Nagel M. V. Baalen Tijmen Blankevoort Max Welling MQ 75 515 0 11 Jun 2019
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision Zhen Dong Z. Yao A. Gholami Michael W. Mahoney Kurt Keutzer MQ 88 528 0 29 Apr 2019
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks Sambhav R. Jain Albert Gural Michael Wu Chris Dick MQ 80 152 0 19 Mar 2019
Learned Step Size Quantization S. K. Esser J. McKinstry Deepika Bablani R. Appuswamy D. Modha MQ 75 810 0 21 Feb 2019