ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.00149
  4. Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
v1v2v3v4v5 (latest)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015
Song Han
Huizi Mao
W. Dally
    3DGS
ArXiv (abs)PDFHTML

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,481 papers shown
Title
Priority-Aware Model-Distributed Inference at Edge Networks
Priority-Aware Model-Distributed Inference at Edge Networks
Teng Li
Hulya Seferoglu
98
1
0
16 Dec 2024
Designing Semi-Structured Pruning of Graph Convolutional Networks for
  Skeleton-based Recognition
Designing Semi-Structured Pruning of Graph Convolutional Networks for Skeleton-based Recognition
Hichem Sahbi
CVBM
121
0
0
16 Dec 2024
MOFHEI: Model Optimizing Framework for Fast and Efficient
  Homomorphically Encrypted Neural Network Inference
MOFHEI: Model Optimizing Framework for Fast and Efficient Homomorphically Encrypted Neural Network Inference
Parsa Ghazvinian
Robert Podschwadt
Prajwal Panzade
Mohammad H. Rafiei
Daniel Takabi
109
0
0
10 Dec 2024
TT-MPD: Test Time Model Pruning and Distillation
TT-MPD: Test Time Model Pruning and Distillation
Haihang Wu
Wei Wang
T. Malepathirana
Sachith Seneviratne
D. Oetomo
Saman K. Halgamuge
116
0
0
10 Dec 2024
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI
  Accelerators
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators
Taesik Gong
F. Kawsar
Chulhong Min
119
3
0
09 Dec 2024
MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based
  Multi-Device Cascade Inference
MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference
Sokratis Nikolaidis
Stylianos I. Venieris
I. Venieris
131
0
0
05 Dec 2024
Quantized and Interpretable Learning Scheme for Deep Neural Networks in
  Classification Task
Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task
Alireza Maleki
Mahsa Lavaei
Mohsen Bagheritabar
Salar Beigzad
Zahra Abadi
MQ
110
0
0
05 Dec 2024
CPTQuant -- A Novel Mixed Precision Post-Training Quantization
  Techniques for Large Language Models
CPTQuant -- A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
Amitash Nanda
Sree Bhargavi Balija
D. Sahoo
MQ
113
0
0
03 Dec 2024
AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation
  Loop on Mobile Devices
AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices
Yuzhan Wang
Sicong Liu
Bin Guo
Boqi Zhang
Ke Ma
Yasan Ding
Hao Luo
Yao Li
Zhiwen Yu
124
3
0
01 Dec 2024
Is Oracle Pruning the True Oracle?
Is Oracle Pruning the True Oracle?
Sicheng Feng
Keda Tao
Haoyu Wang
VLM
148
0
0
28 Nov 2024
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Xiaowen Ma
Zhenliang Ni
Xinghao Chen
Mamba
135
2
0
26 Nov 2024
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
Zhaopeng Tu
VLM
270
0
0
21 Nov 2024
Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning
Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning
Andy Li
A. Durrant
Milan Markovic
Lu Yin
Georgios Leontidis
Tianlong Chen
Lu Yin
Georgios Leontidis
182
0
0
20 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models
  using Soft-Thresholding Mechanism
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
45
0
0
15 Nov 2024
P$^2$ Law: Scaling Law for Post-Training After Model Pruning
P2^22 Law: Scaling Law for Post-Training After Model Pruning
Xiaodong Chen
Yuxuan Hu
Jing Zhang
Yanling Wang
Cuiping Li
Hong Chen
Jing Zhang
91
0
0
15 Nov 2024
Optimizing Traffic Signal Control using High-Dimensional State
  Representation and Efficient Deep Reinforcement Learning
Optimizing Traffic Signal Control using High-Dimensional State Representation and Efficient Deep Reinforcement Learning
Lawrence Francis
Blessed Guda
Ahmed Biyabani
50
0
0
12 Nov 2024
CULL-MT: Compression Using Language and Layer pruning for Machine
  Translation
CULL-MT: Compression Using Language and Layer pruning for Machine Translation
Pedram Rostami
M. Dousti
100
1
0
10 Nov 2024
Client Contribution Normalization for Enhanced Federated Learning
Client Contribution Normalization for Enhanced Federated Learning
Mayank Kumar Kundalwal
Anurag Saraswat
Ishan Mishra
Deepak Mishra
FedML
69
0
0
10 Nov 2024
Learning Morphisms with Gauss-Newton Approximation for Growing Networks
Learning Morphisms with Gauss-Newton Approximation for Growing Networks
Neal Lawton
Aram Galstyan
Greg Ver Steeg
61
0
0
07 Nov 2024
Flashy Backdoor: Real-world Environment Backdoor Attack on SNNs with DVS
  Cameras
Flashy Backdoor: Real-world Environment Backdoor Attack on SNNs with DVS Cameras
Roberto Riaño
Gorka Abad
S. Picek
A. Urbieta
AAML
103
0
0
05 Nov 2024
Magnitude Pruning of Large Pretrained Transformer Models with a Mixture
  Gaussian Prior
Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior
Mingxuan Zhang
Y. Sun
F. Liang
120
0
0
01 Nov 2024
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
144
0
0
01 Nov 2024
Mutual Information Preserving Neural Network Pruning
Mutual Information Preserving Neural Network Pruning
Charles Westphal
Stephen Hailes
Mirco Musolesi
125
1
0
31 Oct 2024
Offline Behavior Distillation
Offline Behavior Distillation
Shiye Lei
Sen Zhang
Dacheng Tao
OffRL
91
0
0
30 Oct 2024
Efficient Reprogramming of Memristive Crossbars for DNNs: Weight Sorting
  and Bit Stucking
Efficient Reprogramming of Memristive Crossbars for DNNs: Weight Sorting and Bit Stucking
Matheus Farias
H. T. Kung
MQ
58
0
0
29 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
484
0
0
29 Oct 2024
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression
Noel Elias
H. Esfahanizadeh
Kaan Kale
S. Vishwanath
Muriel Médard
114
0
0
28 Oct 2024
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep
  Neural Network Inference
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
Changwoo Lee
Soo Min Kwon
Qing Qu
Hun-Seok Kim
95
0
0
28 Oct 2024
Deep Insights into Automated Optimization with Large Language Models and
  Evolutionary Algorithms
Deep Insights into Automated Optimization with Large Language Models and Evolutionary Algorithms
He Yu
Qingbin Liu
93
3
0
28 Oct 2024
Meta-Learning for Speeding Up Large Model Inference in Decentralized
  Environments
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Yuzhe Yang
Yipeng Du
Ahmad Farhan
Claudio Angione
Yue Zhao
Harry Yang
Fielding Johnston
James Buban
Patrick Colangelo
108
0
0
28 Oct 2024
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
  of Neural Networks
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
MQ
76
4
0
28 Oct 2024
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware
  Neuron Management
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management
Tuowei Wang
Ruwen Fan
Minxing Huang
Zixu Hao
Kun Li
Ting Cao
Youyou Lu
Yaoxue Zhang
Ju Ren
94
2
0
25 Oct 2024
LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices
LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices
Chuntao Ding
Xu Cao
Jianhang Xie
Linlin Fan
Shangguang Wang
Zhichao Lu
89
2
0
22 Oct 2024
Mitigating Vanishing Activations in Deep CapsNets Using Channel Pruning
Mitigating Vanishing Activations in Deep CapsNets Using Channel Pruning
Siddharth Sahu
Abdulrahman Altahhan
3DPCMedIm
78
0
0
22 Oct 2024
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Hao Sun
Liwei Wang
LRM
92
13
0
17 Oct 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the
  Hints from Its Router
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Yanyue Xie
Zhi Zhang
Ding Zhou
Cong Xie
Ziang Song
Xin Liu
Yanzhi Wang
Xue Lin
An Xu
LLMAG
89
5
0
15 Oct 2024
Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs
  on Compute-in-Memory Crossbars
Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
Matheus Farias
H. T. Kung
64
1
0
15 Oct 2024
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
Syed Abdul Gaffar Shakhadri
Kruthika KR
Rakshit Aralimatti
VLM
52
0
0
15 Oct 2024
QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and
  Improved Inference Times in CNN Models
QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models
Zhumazhan Balapanov
Edward Magongo
Vanessa Matvei
Olivia Holmberg
Jonathan Pei
Kevin Zhu
68
0
0
14 Oct 2024
Arrhythmia Classification Using Graph Neural Networks Based on Correlation Matrix
Arrhythmia Classification Using Graph Neural Networks Based on Correlation Matrix
Seungwoo Han
72
0
0
14 Oct 2024
GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation
GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation
Dingdong Yang
Yizhi Wang
Konrad Schindler
Ali Mahdavi Amiri
Hao Zhang
100
1
0
13 Oct 2024
t-READi: Transformer-Powered Robust and Efficient Multimodal Inference
  for Autonomous Driving
t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving
Pengfei Hu
Yuhang Qian
Tianyue Zheng
Ang Li
Zhe Chen
Yue Gao
Xiuzhen Cheng
Jun Luo
73
0
0
13 Oct 2024
Gradient-Free Neural Network Training on the Edge
Gradient-Free Neural Network Training on the Edge
Dotan Di Castro
O. Joglekar
Shir Kozlovsky
Vladimir Tchuiev
Michal Moshkovitz
MQ
38
0
0
13 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
177
2
0
13 Oct 2024
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Wenlong Deng
Yize Zhao
V. Vakilian
Minghui Chen
Xiaoxiao Li
Christos Thrampoulidis
236
7
0
12 Oct 2024
Neural Metamorphosis
Neural Metamorphosis
Xingyi Yang
Xinchao Wang
85
2
0
10 Oct 2024
Full-Rank No More: Low-Rank Weight Training for Modern Speech
  Recognition Models
Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
Adriana Fernandez-Lopez
Shiwei Liu
L. Yin
Stavros Petridis
Maja Pantic
62
1
0
10 Oct 2024
QoS-Nets: Adaptive Approximate Neural Network Inference
QoS-Nets: Adaptive Approximate Neural Network Inference
E. Trommer
Bernd Waschneck
Akash Kumar
40
0
0
10 Oct 2024
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
Sagi Shaier
Francisco Pereira
Katharina von der Wense
Lawrence E Hunter
Matt Jones
MoE
123
0
0
10 Oct 2024
Compressing Large Language Models with Automated Sub-Network Search
Compressing Large Language Models with Automated Sub-Network Search
R. Sukthanker
B. Staffler
Frank Hutter
Aaron Klein
LRM
81
0
0
09 Oct 2024
Previous
12345...686970
Next