Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1510.00149
Cited By
v1
v2
v3
v4
v5 (latest)
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1 October 2015
Song Han
Huizi Mao
W. Dally
3DGS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
50 / 3,481 papers shown
Title
Priority-Aware Model-Distributed Inference at Edge Networks
Teng Li
Hulya Seferoglu
98
1
0
16 Dec 2024
Designing Semi-Structured Pruning of Graph Convolutional Networks for Skeleton-based Recognition
Hichem Sahbi
CVBM
121
0
0
16 Dec 2024
MOFHEI: Model Optimizing Framework for Fast and Efficient Homomorphically Encrypted Neural Network Inference
Parsa Ghazvinian
Robert Podschwadt
Prajwal Panzade
Mohammad H. Rafiei
Daniel Takabi
109
0
0
10 Dec 2024
TT-MPD: Test Time Model Pruning and Distillation
Haihang Wu
Wei Wang
T. Malepathirana
Sachith Seneviratne
D. Oetomo
Saman K. Halgamuge
116
0
0
10 Dec 2024
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators
Taesik Gong
F. Kawsar
Chulhong Min
119
3
0
09 Dec 2024
MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference
Sokratis Nikolaidis
Stylianos I. Venieris
I. Venieris
131
0
0
05 Dec 2024
Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task
Alireza Maleki
Mahsa Lavaei
Mohsen Bagheritabar
Salar Beigzad
Zahra Abadi
MQ
110
0
0
05 Dec 2024
CPTQuant -- A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
Amitash Nanda
Sree Bhargavi Balija
D. Sahoo
MQ
113
0
0
03 Dec 2024
AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices
Yuzhan Wang
Sicong Liu
Bin Guo
Boqi Zhang
Ke Ma
Yasan Ding
Hao Luo
Yao Li
Zhiwen Yu
124
3
0
01 Dec 2024
Is Oracle Pruning the True Oracle?
Sicheng Feng
Keda Tao
Haoyu Wang
VLM
148
0
0
28 Nov 2024
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Xiaowen Ma
Zhenliang Ni
Xinghao Chen
Mamba
135
2
0
26 Nov 2024
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
Zhaopeng Tu
VLM
270
0
0
21 Nov 2024
Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning
Andy Li
A. Durrant
Milan Markovic
Lu Yin
Georgios Leontidis
Tianlong Chen
Lu Yin
Georgios Leontidis
182
0
0
20 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
45
0
0
15 Nov 2024
P
2
^2
2
Law: Scaling Law for Post-Training After Model Pruning
Xiaodong Chen
Yuxuan Hu
Jing Zhang
Yanling Wang
Cuiping Li
Hong Chen
Jing Zhang
91
0
0
15 Nov 2024
Optimizing Traffic Signal Control using High-Dimensional State Representation and Efficient Deep Reinforcement Learning
Lawrence Francis
Blessed Guda
Ahmed Biyabani
50
0
0
12 Nov 2024
CULL-MT: Compression Using Language and Layer pruning for Machine Translation
Pedram Rostami
M. Dousti
100
1
0
10 Nov 2024
Client Contribution Normalization for Enhanced Federated Learning
Mayank Kumar Kundalwal
Anurag Saraswat
Ishan Mishra
Deepak Mishra
FedML
69
0
0
10 Nov 2024
Learning Morphisms with Gauss-Newton Approximation for Growing Networks
Neal Lawton
Aram Galstyan
Greg Ver Steeg
61
0
0
07 Nov 2024
Flashy Backdoor: Real-world Environment Backdoor Attack on SNNs with DVS Cameras
Roberto Riaño
Gorka Abad
S. Picek
A. Urbieta
AAML
103
0
0
05 Nov 2024
Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior
Mingxuan Zhang
Y. Sun
F. Liang
120
0
0
01 Nov 2024
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
144
0
0
01 Nov 2024
Mutual Information Preserving Neural Network Pruning
Charles Westphal
Stephen Hailes
Mirco Musolesi
125
1
0
31 Oct 2024
Offline Behavior Distillation
Shiye Lei
Sen Zhang
Dacheng Tao
OffRL
91
0
0
30 Oct 2024
Efficient Reprogramming of Memristive Crossbars for DNNs: Weight Sorting and Bit Stucking
Matheus Farias
H. T. Kung
MQ
58
0
0
29 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
484
0
0
29 Oct 2024
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression
Noel Elias
H. Esfahanizadeh
Kaan Kale
S. Vishwanath
Muriel Médard
114
0
0
28 Oct 2024
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
Changwoo Lee
Soo Min Kwon
Qing Qu
Hun-Seok Kim
95
0
0
28 Oct 2024
Deep Insights into Automated Optimization with Large Language Models and Evolutionary Algorithms
He Yu
Qingbin Liu
93
3
0
28 Oct 2024
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Yuzhe Yang
Yipeng Du
Ahmad Farhan
Claudio Angione
Yue Zhao
Harry Yang
Fielding Johnston
James Buban
Patrick Colangelo
108
0
0
28 Oct 2024
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
MQ
76
4
0
28 Oct 2024
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management
Tuowei Wang
Ruwen Fan
Minxing Huang
Zixu Hao
Kun Li
Ting Cao
Youyou Lu
Yaoxue Zhang
Ju Ren
94
2
0
25 Oct 2024
LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices
Chuntao Ding
Xu Cao
Jianhang Xie
Linlin Fan
Shangguang Wang
Zhichao Lu
89
2
0
22 Oct 2024
Mitigating Vanishing Activations in Deep CapsNets Using Channel Pruning
Siddharth Sahu
Abdulrahman Altahhan
3DPC
MedIm
78
0
0
22 Oct 2024
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Hao Sun
Liwei Wang
LRM
92
13
0
17 Oct 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Yanyue Xie
Zhi Zhang
Ding Zhou
Cong Xie
Ziang Song
Xin Liu
Yanzhi Wang
Xue Lin
An Xu
LLMAG
89
5
0
15 Oct 2024
Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
Matheus Farias
H. T. Kung
64
1
0
15 Oct 2024
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
Syed Abdul Gaffar Shakhadri
Kruthika KR
Rakshit Aralimatti
VLM
52
0
0
15 Oct 2024
QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models
Zhumazhan Balapanov
Edward Magongo
Vanessa Matvei
Olivia Holmberg
Jonathan Pei
Kevin Zhu
68
0
0
14 Oct 2024
Arrhythmia Classification Using Graph Neural Networks Based on Correlation Matrix
Seungwoo Han
72
0
0
14 Oct 2024
GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation
Dingdong Yang
Yizhi Wang
Konrad Schindler
Ali Mahdavi Amiri
Hao Zhang
100
1
0
13 Oct 2024
t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving
Pengfei Hu
Yuhang Qian
Tianyue Zheng
Ang Li
Zhe Chen
Yue Gao
Xiuzhen Cheng
Jun Luo
73
0
0
13 Oct 2024
Gradient-Free Neural Network Training on the Edge
Dotan Di Castro
O. Joglekar
Shir Kozlovsky
Vladimir Tchuiev
Michal Moshkovitz
MQ
38
0
0
13 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
177
2
0
13 Oct 2024
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Wenlong Deng
Yize Zhao
V. Vakilian
Minghui Chen
Xiaoxiao Li
Christos Thrampoulidis
236
7
0
12 Oct 2024
Neural Metamorphosis
Xingyi Yang
Xinchao Wang
85
2
0
10 Oct 2024
Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
Adriana Fernandez-Lopez
Shiwei Liu
L. Yin
Stavros Petridis
Maja Pantic
62
1
0
10 Oct 2024
QoS-Nets: Adaptive Approximate Neural Network Inference
E. Trommer
Bernd Waschneck
Akash Kumar
40
0
0
10 Oct 2024
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
Sagi Shaier
Francisco Pereira
Katharina von der Wense
Lawrence E Hunter
Matt Jones
MoE
123
0
0
10 Oct 2024
Compressing Large Language Models with Automated Sub-Network Search
R. Sukthanker
B. Staffler
Frank Hutter
Aaron Klein
LRM
81
0
0
09 Oct 2024
Previous
1
2
3
4
5
...
68
69
70
Next