ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.00907
  4. Cited By
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
v1v2v3 (latest)

On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance

1 November 2024
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
    VLM
ArXiv (abs)PDFHTML

Papers citing "On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance"

23 / 73 papers shown
Title
Low-bit Quantization of Neural Networks for Efficient Inference
Low-bit Quantization of Neural Networks for Efficient Inference
Yoni Choukroun
Eli Kravchik
Fan Yang
P. Kisilev
MQ
75
364
0
18 Feb 2019
Same, Same But Different - Recovering Neural Network Quantization Error
  Through Weight Factorization
Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization
Eldad Meller
Alexander Finkelstein
Uri Almog
Mark Grobman
MQ
61
87
0
05 Feb 2019
Improving Neural Network Quantization without Retraining using Outlier
  Channel Splitting
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
OODDMQ
104
311
0
28 Jan 2019
Auto-tuning Neural Network Quantization Framework for Collaborative
  Inference Between the Cloud and Edge
Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge
Guangli Li
Lei Liu
Xueying Wang
Xiao-jun Dong
Peng Zhao
Xiaobing Feng
MQ
66
65
0
16 Dec 2018
NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for
  Continuous Mobile Vision
NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision
Biyi Fang
Xiao Zeng
Mi Zhang
3DH
83
269
0
23 Oct 2018
Quantization for Rapid Deployment of Deep Neural Networks
Quantization for Rapid Deployment of Deep Neural Networks
J. Lee
Sangwon Ha
Saerom Choi
Won-Jo Lee
Seungwon Lee
MQ
51
49
0
12 Oct 2018
SNIP: Single-shot Network Pruning based on Connection Sensitivity
SNIP: Single-shot Network Pruning based on Connection Sensitivity
Namhoon Lee
Thalaiyasingam Ajanthan
Philip Torr
VLM
269
1,211
0
04 Oct 2018
Relaxed Quantization for Discretized Neural Networks
Relaxed Quantization for Discretized Neural Networks
Christos Louizos
M. Reisser
Tijmen Blankevoort
E. Gavves
Max Welling
MQ
96
132
0
03 Oct 2018
Edge Intelligence: On-Demand Deep Learning Model Co-Inference with
  Device-Edge Synergy
Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy
En Li
Zhi Zhou
Xu Chen
56
328
0
20 Jun 2018
PACT: Parameterized Clipping Activation for Quantized Neural Networks
PACT: Parameterized Clipping Activation for Quantized Neural Networks
Jungwook Choi
Zhuo Wang
Swagath Venkataramani
P. Chuang
Vijayalakshmi Srinivasan
K. Gopalakrishnan
MQ
68
955
0
16 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,200
0
20 Apr 2018
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
164
3,143
0
15 Dec 2017
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
Song Han
Junlong Kang
Huizi Mao
Yiming Hu
Xin Li
...
Hong Luo
Song Yao
Yu Wang
Huazhong Yang
W. Dally
78
630
0
01 Dec 2016
Aggregated Residual Transformations for Deep Neural Networks
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
522
10,351
0
16 Nov 2016
Designing Energy-Efficient Convolutional Neural Networks using
  Energy-Aware Pruning
Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
Tien-Ju Yang
Yu-hsin Chen
Vivienne Sze
3DV
91
742
0
16 Nov 2016
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low
  Bitwidth Gradients
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Shuchang Zhou
Yuxin Wu
Zekun Ni
Xinyu Zhou
He Wen
Yuheng Zou
MQ
129
2,090
0
20 Jun 2016
Wide Residual Networks
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
353
8,002
0
23 May 2016
Fixed Point Quantization of Deep Convolutional Networks
Fixed Point Quantization of Deep Convolutional Networks
D. Lin
S. Talathi
V. Annapureddy
MQ
104
816
0
19 Nov 2015
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
263
8,862
0
01 Oct 2015
Learning both Weights and Connections for Efficient Neural Networks
Learning both Weights and Connections for Efficient Neural Networks
Song Han
Jeff Pool
J. Tran
W. Dally
CVBM
316
6,709
0
08 Jun 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
367
19,733
0
09 Mar 2015
Deep Learning with Limited Numerical Precision
Deep Learning with Limited Numerical Precision
Suyog Gupta
A. Agrawal
K. Gopalakrishnan
P. Narayanan
HAI
207
2,049
0
09 Feb 2015
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLMObjD
1.7K
39,615
0
01 Sep 2014
Previous
12