Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.13041
Cited By
Auto-Split: A General Framework of Collaborative Edge-Cloud AI
30 August 2021
Amin Banitalebi-Dehkordi
Naveen Vedula
J. Pei
Fei Xia
Lanjun Wang
Yong Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Auto-Split: A General Framework of Collaborative Edge-Cloud AI"
40 / 40 papers shown
Title
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
80
0
0
01 Nov 2024
Synergy: Towards On-Body AI via Tiny AI Accelerator Collaboration on Wearables
Taesik Gong
S. Jang
Utku Günay Acer
F. Kawsar
Chulhong Min
95
2
0
11 Dec 2023
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Stefanos Laskaridis
Stylianos I. Venieris
Mario Almeida
Ilias Leontiadis
Nicholas D. Lane
76
272
0
14 Aug 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
755
42,055
0
28 May 2020
ZeroQ: A Novel Zero Shot Quantization Framework
Yaohui Cai
Z. Yao
Zhen Dong
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
88
397
0
01 Jan 2020
Loss Aware Post-training Quantization
Yury Nahshan
Brian Chmiel
Chaim Baskin
Evgenii Zheltonozhskii
Ron Banner
A. Bronstein
A. Mendelson
MQ
77
166
0
17 Nov 2019
Neural Network Distiller: A Python Package For DNN Compression Research
Neta Zmora
Guy Jacob
Lev Zlotnik
Bar Elharar
Gal Novik
50
74
0
27 Oct 2019
Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing
En Li
Liekang Zeng
Zhi Zhou
Xu Chen
49
627
0
04 Oct 2019
PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Yujeong Choi
Minsoo Rhu
50
129
0
06 Sep 2019
PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors
Angelo Garofalo
Manuele Rusci
Francesco Conti
D. Rossi
Luca Benini
MQ
51
135
0
29 Aug 2019
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming
Claudio Michaelis
Benjamin Mitzkus
Robert Geirhos
E. Rusak
Oliver Bringmann
Alexander S. Ecker
Matthias Bethge
Wieland Brendel
3DPC
97
445
0
17 Jul 2019
Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers
Manuele Rusci
Alessandro Capotondi
Luca Benini
MQ
71
75
0
30 May 2019
SCAN: A Scalable Neural Networks Framework Towards Compact and Efficient Models
Linfeng Zhang
Zhanhong Tan
Jiebo Song
Jingwei Chen
Chenglong Bao
Kaisheng Ma
38
71
0
27 May 2019
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
Zhen Dong
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
80
526
0
29 Apr 2019
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Dan Hendrycks
Thomas G. Dietterich
OOD
VLM
172
3,435
0
28 Mar 2019
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
75
802
0
21 Feb 2019
Low-bit Quantization of Neural Networks for Efficient Inference
Yoni Choukroun
Eli Kravchik
Fan Yang
P. Kisilev
MQ
69
362
0
18 Feb 2019
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
OODD
MQ
91
310
0
28 Jan 2019
ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation
Xiaoliang Dai
Peizhao Zhang
Bichen Wu
Hongxu Yin
Fei Sun
...
Yiming Wu
Yangqing Jia
Peter Vajda
M. Uyttendaele
N. Jha
71
272
0
21 Dec 2018
Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
Bichen Wu
Yanghan Wang
Peizhao Zhang
Yuandong Tian
Peter Vajda
Kurt Keutzer
MQ
66
273
0
30 Nov 2018
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
MQ
127
881
0
21 Nov 2018
SCALE-Sim: Systolic CNN Accelerator Simulator
A. Samajdar
Yuhao Zhu
P. Whatmough
Matthew Mattina
Tushar Krishna
89
137
0
16 Oct 2018
Post-training 4-bit quantization of convolution networks for rapid-deployment
Ron Banner
Yury Nahshan
Elad Hoffer
Daniel Soudry
MQ
50
94
0
02 Oct 2018
Not Just Privacy: Improving Performance of Private Deep Learning in Mobile Cloud
Ji Wang
Jianguo Zhang
Weidong Bao
Xiaomin Zhu
Bokai Cao
Philip S. Yu
56
195
0
10 Sep 2018
MnasNet: Platform-Aware Neural Architecture Search for Mobile
Mingxing Tan
Bo Chen
Ruoming Pang
Vijay Vasudevan
Mark Sandler
Andrew G. Howard
Quoc V. Le
MQ
120
3,009
0
31 Jul 2018
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Dongqing Zhang
Jiaolong Yang
Dongqiangzi Ye
G. Hua
MQ
62
703
0
26 Jul 2018
Quantizing deep convolutional networks for efficient inference: A whitepaper
Raghuraman Krishnamoorthi
MQ
139
1,016
0
21 Jun 2018
Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
Haichuan Yang
Yuhao Zhu
Ji Liu
CVBM
86
36
0
12 Jun 2018
PACT: Parameterized Clipping Activation for Quantized Neural Networks
Jungwook Choi
Zhuo Wang
Swagath Venkataramani
P. Chuang
Vijayalakshmi Srinivasan
K. Gopalakrishnan
MQ
62
953
0
16 May 2018
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
Tien-Ju Yang
Andrew G. Howard
Bo Chen
Xiao Zhang
Alec Go
Mark Sandler
Vivienne Sze
Hartwig Adam
133
521
0
09 Apr 2018
Model compression via distillation and quantization
A. Polino
Razvan Pascanu
Dan Alistarh
MQ
83
731
0
15 Feb 2018
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
150
3,130
0
15 Dec 2017
Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
Asit K. Mishra
Debbie Marr
FedML
65
331
0
15 Nov 2017
A Survey of Model Compression and Acceleration for Deep Neural Networks
Yu Cheng
Duo Wang
Pan Zhou
Zhang Tao
72
1,095
0
23 Oct 2017
BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks
Surat Teerapittayanon
Bradley McDanel
H. T. Kung
UQCV
97
1,140
0
06 Sep 2017
Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors
Hong-Yu Zhou
Bin-Bin Gao
Jianxin Wu
ObjD
64
29
0
20 Jul 2017
Channel Pruning for Accelerating Very Deep Neural Networks
Yihui He
Xiangyu Zhang
Jian Sun
201
2,524
0
19 Jul 2017
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Shuchang Zhou
Yuxin Wu
Zekun Ni
Xinyu Zhou
He Wen
Yuheng Zou
MQ
119
2,088
0
20 Jun 2016
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
255
8,842
0
01 Oct 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
359
19,643
0
09 Mar 2015
1