ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.00907
  4. Cited By
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
v1v2v3 (latest)

On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance

1 November 2024
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
    VLM
ArXiv (abs)PDFHTML

Papers citing "On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance"

50 / 73 papers shown
Title
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
121
3
0
25 Mar 2024
Efficient Post-training Quantization with FP8 Formats
Efficient Post-training Quantization with FP8 Formats
Haihao Shen
Naveen Mellempudi
Xin He
Q. Gao
Chang‐Bao Wang
Mengni Wang
MQ
58
23
0
26 Sep 2023
Pruning vs Quantization: Which is Better?
Pruning vs Quantization: Which is Better?
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
112
51
0
06 Jul 2023
Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand
  Edge Resource
Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand Edge Resource
Xiang Yang
Deliang Chen
Q. Qi
Jingyu Wang
Haifeng Sun
J. Liao
Song Guo
124
3
0
21 Jun 2023
QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
Shira Guskin
Moshe Wasserblat
Chang Wang
Haihao Shen
MQ
58
2
0
31 Oct 2022
Fast DistilBERT on CPUs
Fast DistilBERT on CPUs
Haihao Shen
Ofir Zafrir
Bo Dong
Hengyu Meng
Xinyu. Ye
Zhe Wang
Yi Ding
Hanwen Chang
Guy Boudoukh
Moshe Wasserblat
VLM
42
2
0
27 Oct 2022
Combining Compressions for Multiplicative Size Scaling on Natural
  Language Tasks
Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks
Rajiv Movva
Jinhao Lei
Shayne Longpre
Ajay K. Gupta
Chris DuBois
VLMMQ
59
5
0
20 Aug 2022
An Empirical Study of Challenges in Converting Deep Learning Models
An Empirical Study of Challenges in Converting Deep Learning Models
Moses Openja
Amin Nikanjam
Ahmed Haj Yahmed
Foutse Khomh
Zhen Ming
Zhengyong Jiang
AAML
98
19
0
28 Jun 2022
Optimal Clipping and Magnitude-aware Differentiation for Improved
  Quantization-aware Training
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training
Charbel Sakr
Steve Dai
Rangharajan Venkatesan
B. Zimmer
W. Dally
Brucek Khailany
MQ
64
41
0
13 Jun 2022
OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization
OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization
Peng Hu
Xi Peng
Erik Cambria
M. Aly
Jie Lin
MQ
91
61
0
23 May 2022
Overcoming Oscillations in Quantization-Aware Training
Overcoming Oscillations in Quantization-Aware Training
Markus Nagel
Marios Fournarakis
Yelysei Bondarenko
Tijmen Blankevoort
MQ
172
108
0
21 Mar 2022
SC2 Benchmark: Supervised Compression for Split Computing
SC2 Benchmark: Supervised Compression for Split Computing
Yoshitomo Matsubara
Ruihan Yang
Marco Levorato
Stephan Mandt
102
20
0
16 Mar 2022
BottleFit: Learning Compressed Representations in Deep Neural Networks
  for Effective and Efficient Split Computing
BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing
Yoshitomo Matsubara
Davide Callegaro
Sameer Singh
Marco Levorato
Francesco Restuccia
58
41
0
07 Jan 2022
Prune Once for All: Sparse Pre-Trained Language Models
Prune Once for All: Sparse Pre-Trained Language Models
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
VLM
58
85
0
10 Nov 2021
Auto-Split: A General Framework of Collaborative Edge-Cloud AI
Auto-Split: A General Framework of Collaborative Edge-Cloud AI
Amin Banitalebi-Dehkordi
Naveen Vedula
J. Pei
Fei Xia
Lanjun Wang
Yong Zhang
64
92
0
30 Aug 2021
Supervised Compression for Resource-Constrained Edge Computing Systems
Supervised Compression for Resource-Constrained Edge Computing Systems
Yoshitomo Matsubara
Ruihan Yang
Marco Levorato
Stephan Mandt
87
58
0
21 Aug 2021
PQK: Model Compression via Pruning, Quantization, and Knowledge
  Distillation
PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
Jang-Hyun Kim
Simyung Chang
Nojun Kwak
61
45
0
25 Jun 2021
Post-Training Sparsity-Aware Quantization
Post-Training Sparsity-Aware Quantization
Gil Shomron
F. Gabbay
Samer Kurzum
U. Weiser
MQ
76
34
0
23 May 2021
Single-Training Collaborative Object Detectors Adaptive to Bandwidth and
  Computation
Single-Training Collaborative Object Detectors Adaptive to Bandwidth and Computation
Juliano S. Assine
José Cândido Silveira Santos Filho
Eduardo Valle
ObjD
84
8
0
03 May 2021
Ps and Qs: Quantization-aware pruning for efficient low latency neural
  network inference
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference
B. Hawks
Javier Mauricio Duarte
Nicholas J. Fraser
Alessandro Pappalardo
N. Tran
Yaman Umuroglu
MQ
52
51
0
22 Feb 2021
Confounding Tradeoffs for Neural Network Quantization
Confounding Tradeoffs for Neural Network Quantization
Sahaj Garg
Anirudh Jain
Joe Lou
Mitchell Nahmias
MQ
66
18
0
12 Feb 2021
Dynamic Precision Analog Computing for Neural Networks
Dynamic Precision Analog Computing for Neural Networks
Sahaj Garg
Joe Lou
Anirudh Jain
Mitchell Nahmias
63
33
0
12 Feb 2021
BRECQ: Pushing the Limit of Post-Training Quantization by Block
  Reconstruction
BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction
Yuhang Li
Ruihao Gong
Xu Tan
Yang Yang
Peng Hu
Qi Zhang
F. Yu
Wei Wang
Shi Gu
MQ
138
444
0
10 Feb 2021
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with
  Learned Step Size Quantization
KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization
Jing Jin
Cai Liang
Tiancheng Wu
Li Zou
Zhiliang Gan
MQ
55
27
0
15 Jan 2021
Once Quantization-Aware Training: High Performance Extremely Low-bit
  Architecture Search
Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search
Mingzhu Shen
Feng Liang
Ruihao Gong
Yuhang Li
Chuming Li
Chen Lin
F. Yu
Junjie Yan
Wanli Ouyang
MQ
63
39
0
09 Oct 2020
Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized
  Deep Neural Networks
Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks
Yoonho Boo
Sungho Shin
Jungwook Choi
Wonyong Sung
MQ
62
30
0
30 Sep 2020
Degree-Quant: Quantization-Aware Training for Graph Neural Networks
Degree-Quant: Quantization-Aware Training for Graph Neural Networks
Shyam A. Tailor
Javier Fernandez-Marques
Nicholas D. Lane
GNNMQ
50
145
0
11 Aug 2020
Neural Compression and Filtering for Edge-assisted Real-time Object
  Detection in Challenged Networks
Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks
Yoshitomo Matsubara
Marco Levorato
55
54
0
31 Jul 2020
Improving Post Training Neural Quantization: Layer-wise Calibration and
  Integer Programming
Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming
Itay Hubara
Yury Nahshan
Y. Hanani
Ron Banner
Daniel Soudry
MQ
107
128
0
14 Jun 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
162
2,986
0
09 Jun 2020
Bayesian Bits: Unifying Quantization and Pruning
Bayesian Bits: Unifying Quantization and Pruning
M. V. Baalen
Christos Louizos
Markus Nagel
Rana Ali Amjad
Ying Wang
Tijmen Blankevoort
Max Welling
MQ
79
116
0
14 May 2020
Up or Down? Adaptive Rounding for Post-Training Quantization
Up or Down? Adaptive Rounding for Post-Training Quantization
Markus Nagel
Rana Ali Amjad
M. V. Baalen
Christos Louizos
Tijmen Blankevoort
MQ
92
586
0
22 Apr 2020
LSQ+: Improving low-bit quantization through learnable offsets and
  better initialization
LSQ+: Improving low-bit quantization through learnable offsets and better initialization
Yash Bhalgat
Jinwon Lee
Markus Nagel
Tijmen Blankevoort
Nojun Kwak
MQ
62
222
0
20 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
99
246
0
15 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
84
360
0
05 Apr 2020
Understanding and Improving Knowledge Distillation
Understanding and Improving Knowledge Distillation
Jiaxi Tang
Rakesh Shivanna
Zhe Zhao
Dong Lin
Anima Singh
Ed H. Chi
Sagar Jain
88
133
0
10 Feb 2020
Post-Training Piecewise Linear Quantization for Deep Neural Networks
Post-Training Piecewise Linear Quantization for Deep Neural Networks
Jun Fang
Ali Shafiee
Hamzah Abdel-Aziz
D. Thorsley
Georgios Georgiadis
Joseph Hassoun
MQ
71
147
0
31 Jan 2020
ZeroQ: A Novel Zero Shot Quantization Framework
ZeroQ: A Novel Zero Shot Quantization Framework
Yaohui Cai
Z. Yao
Zhen Dong
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
98
399
0
01 Jan 2020
QKD: Quantization-aware Knowledge Distillation
QKD: Quantization-aware Knowledge Distillation
Jangho Kim
Yash Bhalgat
Jinwon Lee
Chirag I. Patel
Nojun Kwak
MQ
90
66
0
28 Nov 2019
Model Pruning Enables Efficient Federated Learning on Edge Devices
Model Pruning Enables Efficient Federated Learning on Edge Devices
Yuang Jiang
Shiqiang Wang
Victor Valls
Bongjun Ko
Wei-Han Lee
Kin K. Leung
Leandros Tassiulas
97
463
0
26 Sep 2019
Edge Intelligence: The Confluence of Edge Computing and Artificial
  Intelligence
Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence
Shuiguang Deng
Hailiang Zhao
Weijia Fang
Yuxiang Cai
Schahram Dustdar
Albert Y. Zomaya
100
616
0
02 Sep 2019
Improved Techniques for Training Adaptive Deep Networks
Improved Techniques for Training Adaptive Deep Networks
Hao Li
Hong Zhang
Xiaojuan Qi
Ruigang Yang
Gao Huang
69
132
0
17 Aug 2019
Machine Learning at the Network Edge: A Survey
Machine Learning at the Network Edge: A Survey
M. G. Sarwar Murshed
Chris Murphy
Daqing Hou
Nazar Khan
Ganesh Ananthanarayanan
Faraz Hussain
60
384
0
31 Jul 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
686
24,557
0
26 Jul 2019
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
Xiaofei Wang
Yiwen Han
Victor C. M. Leung
Dusit Niyato
Xueqiang Yan
Xu Chen
83
998
0
19 Jul 2019
Wireless Federated Distillation for Distributed Edge Learning with
  Heterogeneous Data
Wireless Federated Distillation for Distributed Edge Learning with Heterogeneous Data
Jinhyun Ahn
Osvaldo Simeone
Joonhyuk Kang
FedML
53
109
0
05 Jul 2019
Data-Free Quantization Through Weight Equalization and Bias Correction
Data-Free Quantization Through Weight Equalization and Bias Correction
Markus Nagel
M. V. Baalen
Tijmen Blankevoort
Max Welling
MQ
75
515
0
11 Jun 2019
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Zhen Dong
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
88
528
0
29 Apr 2019
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point
  Inference of Deep Neural Networks
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
Sambhav R. Jain
Albert Gural
Michael Wu
Chris Dick
MQ
80
152
0
19 Mar 2019
Learned Step Size Quantization
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
75
810
0
21 Feb 2019
12
Next