Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.00907
Cited By
v1
v2
v3 (latest)
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
1 November 2024
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance"
50 / 73 papers shown
Title
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
121
3
0
25 Mar 2024
Efficient Post-training Quantization with FP8 Formats
Haihao Shen
Naveen Mellempudi
Xin He
Q. Gao
Chang‐Bao Wang
Mengni Wang
MQ
58
23
0
26 Sep 2023
Pruning vs Quantization: Which is Better?
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
112
51
0
06 Jul 2023
Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand Edge Resource
Xiang Yang
Deliang Chen
Q. Qi
Jingyu Wang
Haifeng Sun
J. Liao
Song Guo
124
3
0
21 Jun 2023
QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
Shira Guskin
Moshe Wasserblat
Chang Wang
Haihao Shen
MQ
58
2
0
31 Oct 2022
Fast DistilBERT on CPUs
Haihao Shen
Ofir Zafrir
Bo Dong
Hengyu Meng
Xinyu. Ye
Zhe Wang
Yi Ding
Hanwen Chang
Guy Boudoukh
Moshe Wasserblat
VLM
42
2
0
27 Oct 2022
Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks
Rajiv Movva
Jinhao Lei
Shayne Longpre
Ajay K. Gupta
Chris DuBois
VLM
MQ
59
5
0
20 Aug 2022
An Empirical Study of Challenges in Converting Deep Learning Models
Moses Openja
Amin Nikanjam
Ahmed Haj Yahmed
Foutse Khomh
Zhen Ming
Zhengyong Jiang
AAML
98
19
0
28 Jun 2022
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training
Charbel Sakr
Steve Dai
Rangharajan Venkatesan
B. Zimmer
W. Dally
Brucek Khailany
MQ
64
41
0
13 Jun 2022
OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization
Peng Hu
Xi Peng
Erik Cambria
M. Aly
Jie Lin
MQ
91
61
0
23 May 2022
Overcoming Oscillations in Quantization-Aware Training
Markus Nagel
Marios Fournarakis
Yelysei Bondarenko
Tijmen Blankevoort
MQ
172
108
0
21 Mar 2022
SC2 Benchmark: Supervised Compression for Split Computing
Yoshitomo Matsubara
Ruihan Yang
Marco Levorato
Stephan Mandt
102
20
0
16 Mar 2022
BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing
Yoshitomo Matsubara
Davide Callegaro
Sameer Singh
Marco Levorato
Francesco Restuccia
58
41
0
07 Jan 2022
Prune Once for All: Sparse Pre-Trained Language Models
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
VLM
58
85
0
10 Nov 2021
Auto-Split: A General Framework of Collaborative Edge-Cloud AI
Amin Banitalebi-Dehkordi
Naveen Vedula
J. Pei
Fei Xia
Lanjun Wang
Yong Zhang
64
92
0
30 Aug 2021
Supervised Compression for Resource-Constrained Edge Computing Systems
Yoshitomo Matsubara
Ruihan Yang
Marco Levorato
Stephan Mandt
87
58
0
21 Aug 2021
PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
Jang-Hyun Kim
Simyung Chang
Nojun Kwak
61
45
0
25 Jun 2021
Post-Training Sparsity-Aware Quantization
Gil Shomron
F. Gabbay
Samer Kurzum
U. Weiser
MQ
76
34
0
23 May 2021
Single-Training Collaborative Object Detectors Adaptive to Bandwidth and Computation
Juliano S. Assine
José Cândido Silveira Santos Filho
Eduardo Valle
ObjD
84
8
0
03 May 2021
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference
B. Hawks
Javier Mauricio Duarte
Nicholas J. Fraser
Alessandro Pappalardo
N. Tran
Yaman Umuroglu
MQ
52
51
0
22 Feb 2021
Confounding Tradeoffs for Neural Network Quantization
Sahaj Garg
Anirudh Jain
Joe Lou
Mitchell Nahmias
MQ
66
18
0
12 Feb 2021
Dynamic Precision Analog Computing for Neural Networks
Sahaj Garg
Joe Lou
Anirudh Jain
Mitchell Nahmias
63
33
0
12 Feb 2021
BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction
Yuhang Li
Ruihao Gong
Xu Tan
Yang Yang
Peng Hu
Qi Zhang
F. Yu
Wei Wang
Shi Gu
MQ
138
444
0
10 Feb 2021
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization
Jing Jin
Cai Liang
Tiancheng Wu
Li Zou
Zhiliang Gan
MQ
55
27
0
15 Jan 2021
Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search
Mingzhu Shen
Feng Liang
Ruihao Gong
Yuhang Li
Chuming Li
Chen Lin
F. Yu
Junjie Yan
Wanli Ouyang
MQ
63
39
0
09 Oct 2020
Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks
Yoonho Boo
Sungho Shin
Jungwook Choi
Wonyong Sung
MQ
62
30
0
30 Sep 2020
Degree-Quant: Quantization-Aware Training for Graph Neural Networks
Shyam A. Tailor
Javier Fernandez-Marques
Nicholas D. Lane
GNN
MQ
50
145
0
11 Aug 2020
Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks
Yoshitomo Matsubara
Marco Levorato
55
54
0
31 Jul 2020
Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming
Itay Hubara
Yury Nahshan
Y. Hanani
Ron Banner
Daniel Soudry
MQ
107
128
0
14 Jun 2020
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
162
2,986
0
09 Jun 2020
Bayesian Bits: Unifying Quantization and Pruning
M. V. Baalen
Christos Louizos
Markus Nagel
Rana Ali Amjad
Ying Wang
Tijmen Blankevoort
Max Welling
MQ
79
116
0
14 May 2020
Up or Down? Adaptive Rounding for Post-Training Quantization
Markus Nagel
Rana Ali Amjad
M. V. Baalen
Christos Louizos
Tijmen Blankevoort
MQ
92
586
0
22 Apr 2020
LSQ+: Improving low-bit quantization through learnable offsets and better initialization
Yash Bhalgat
Jinwon Lee
Markus Nagel
Tijmen Blankevoort
Nojun Kwak
MQ
62
222
0
20 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
99
246
0
15 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
84
360
0
05 Apr 2020
Understanding and Improving Knowledge Distillation
Jiaxi Tang
Rakesh Shivanna
Zhe Zhao
Dong Lin
Anima Singh
Ed H. Chi
Sagar Jain
88
133
0
10 Feb 2020
Post-Training Piecewise Linear Quantization for Deep Neural Networks
Jun Fang
Ali Shafiee
Hamzah Abdel-Aziz
D. Thorsley
Georgios Georgiadis
Joseph Hassoun
MQ
71
147
0
31 Jan 2020
ZeroQ: A Novel Zero Shot Quantization Framework
Yaohui Cai
Z. Yao
Zhen Dong
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
98
399
0
01 Jan 2020
QKD: Quantization-aware Knowledge Distillation
Jangho Kim
Yash Bhalgat
Jinwon Lee
Chirag I. Patel
Nojun Kwak
MQ
90
66
0
28 Nov 2019
Model Pruning Enables Efficient Federated Learning on Edge Devices
Yuang Jiang
Shiqiang Wang
Victor Valls
Bongjun Ko
Wei-Han Lee
Kin K. Leung
Leandros Tassiulas
97
463
0
26 Sep 2019
Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence
Shuiguang Deng
Hailiang Zhao
Weijia Fang
Yuxiang Cai
Schahram Dustdar
Albert Y. Zomaya
100
616
0
02 Sep 2019
Improved Techniques for Training Adaptive Deep Networks
Hao Li
Hong Zhang
Xiaojuan Qi
Ruigang Yang
Gao Huang
69
132
0
17 Aug 2019
Machine Learning at the Network Edge: A Survey
M. G. Sarwar Murshed
Chris Murphy
Daqing Hou
Nazar Khan
Ganesh Ananthanarayanan
Faraz Hussain
60
384
0
31 Jul 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
686
24,557
0
26 Jul 2019
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
Xiaofei Wang
Yiwen Han
Victor C. M. Leung
Dusit Niyato
Xueqiang Yan
Xu Chen
83
998
0
19 Jul 2019
Wireless Federated Distillation for Distributed Edge Learning with Heterogeneous Data
Jinhyun Ahn
Osvaldo Simeone
Joonhyuk Kang
FedML
53
109
0
05 Jul 2019
Data-Free Quantization Through Weight Equalization and Bias Correction
Markus Nagel
M. V. Baalen
Tijmen Blankevoort
Max Welling
MQ
75
515
0
11 Jun 2019
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Zhen Dong
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
88
528
0
29 Apr 2019
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
Sambhav R. Jain
Albert Gural
Michael Wu
Chris Dick
MQ
80
152
0
19 Mar 2019
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
75
810
0
21 Feb 2019
1
2
Next