ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.00149
  4. Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
v1v2v3v4v5 (latest)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015
Song Han
Huizi Mao
W. Dally
    3DGS
ArXiv (abs)PDFHTML

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,481 papers shown
Title
Compressing Large Language Models with Automated Sub-Network Search
Compressing Large Language Models with Automated Sub-Network Search
R. Sukthanker
B. Staffler
Frank Hutter
Aaron Klein
LRM
79
0
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
67
7
0
08 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to
  See
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
100
9
0
08 Oct 2024
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR
  Through Trajectory Coarse Discretization and Pre-training
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
Junxiao Shen
Khadija Khaldi
Enmin Zhou
Hemant Bhaskar Surale
Amy Karlson
42
0
0
08 Oct 2024
Addition is All You Need for Energy-efficient Language Models
Addition is All You Need for Energy-efficient Language Models
Hongyin Luo
Wei Sun
30
7
0
01 Oct 2024
Compressing Recurrent Neural Networks for FPGA-accelerated
  Implementation in Fluorescence Lifetime Imaging
Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging
Ismail Erbas
Vikas Pandey
Aporva Amarnath
Naigang Wang
Karthik Swaminathan
Stefan T. Radev
Xavier Intes
AI4CE
81
1
0
01 Oct 2024
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
Matteo Carnelos
Francesco Pasti
Nicola Bellotto
73
1
0
28 Sep 2024
Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse
  Training
Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse Training
Pihe Hu
Shaolong Li
Zhuoran Li
L. Pan
Longbo Huang
47
0
0
28 Sep 2024
Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on
  Mixed-Signal Accelerators
Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on Mixed-Signal Accelerators
Seyedarmin Azizi
Mohammad Erfan Sadeghi
M. Kamal
Massoud Pedram
63
2
0
27 Sep 2024
Mitigating Selection Bias with Node Pruning and Auxiliary Options
Mitigating Selection Bias with Node Pruning and Auxiliary Options
Hyeong Kyu Choi
Weijie Xu
Chi Xue
Stephanie Eckman
Chandan K. Reddy
92
2
0
27 Sep 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on
  GPU Tensor Cores
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
101
4
0
26 Sep 2024
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Gongfan Fang
Hongxu Yin
Saurav Muralidharan
Greg Heinrich
Jeff Pool
Jan Kautz
Pavlo Molchanov
Xinchao Wang
73
10
0
26 Sep 2024
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for
  Resource-Constrained Embedded Platforms
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded Platforms
Niraj Pudasaini
Muhammad Abdullah Hanif
Mohamed Bennai
55
0
0
22 Sep 2024
CFSP: An Efficient Structured Pruning Framework for LLMs with
  Coarse-to-Fine Activation Information
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Yuxin Wang
Minghua Ma
Zekun Wang
Jingchang Chen
Huiming Fan
Liping Shan
Qing Yang
Dongliang Xu
Ming Liu
Bing Qin
79
4
0
20 Sep 2024
Green Federated Learning: A new era of Green Aware AI
Green Federated Learning: A new era of Green Aware AI
Dipanwita Thakur
Antonella Guzzo
Giancarlo Fortino
Francesco Piccialli
AI4CE
115
5
0
19 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Andrew G. Howard
MQ
63
2
0
14 Sep 2024
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Yuezhou Hu
Jun-Jie Zhu
Jianfei Chen
136
0
0
13 Sep 2024
Expediting and Elevating Large Language Model Reasoning via Hidden
  Chain-of-Thought Decoding
Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding
Tianqiao Liu
Zui Chen
Zitao Liu
Mi Tian
Weiqi Luo
LRM
72
3
0
13 Sep 2024
NVRC: Neural Video Representation Compression
NVRC: Neural Video Representation Compression
Ho Man Kwan
Ge Gao
Fan Zhang
Andrew Gower
David Bull
82
12
0
11 Sep 2024
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning
Tianyi Chen
Xiaoyi Qu
David Aponte
Colby R. Banbury
Jongwoo Ko
Tianyu Ding
Yong Ma
Vladimir Lyapunov
Ilya Zharkov
Luming Liang
196
2
0
11 Sep 2024
Towards Energy-Efficiency by Navigating the Trilemma of Energy, Latency,
  and Accuracy
Towards Energy-Efficiency by Navigating the Trilemma of Energy, Latency, and Accuracy
Boyuan Tian
Yihan Pang
Muhammad Huzaifa
Shenlong Wang
Sarita Adve
91
1
0
06 Sep 2024
Panoptic Perception for Autonomous Driving: A Survey
Panoptic Perception for Autonomous Driving: A Survey
Yunge Li
Lanyu Xu
129
3
0
27 Aug 2024
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li
Xuewen Liu
Dongrong Fu
Jianquan Li
Qingyi Gu
Kurt Keutzer
Zhen Dong
EGVMVGenDiffM
186
2
0
26 Aug 2024
Condensed Sample-Guided Model Inversion for Knowledge Distillation
Condensed Sample-Guided Model Inversion for Knowledge Distillation
Kuluhan Binici
Shivam Aggarwal
Cihan Acar
N. Pham
K. Leman
Gim Hee Lee
Tulika Mitra
93
1
0
25 Aug 2024
MPruner: Optimizing Neural Network Size with CKA-Based Mutual
  Information Pruning
MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning
Seungbeom Hu
ChanJun Park
Andrew Ferraiuolo
Sang-Ki Ko
Jinwoo Kim
Haein Song
Jieung Kim
119
1
0
24 Aug 2024
A Greedy Hierarchical Approach to Whole-Network Filter-Pruning in CNNs
A Greedy Hierarchical Approach to Whole-Network Filter-Pruning in CNNs
Kiran Purohit
Anurag Parvathgari
Sourangshu Bhattacharya
VLM
72
0
0
22 Aug 2024
Real-Time Video Generation with Pyramid Attention Broadcast
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao
Xiaolong Jin
Kai Wang
Yang You
VGenDiffM
201
45
0
22 Aug 2024
Domain-invariant Progressive Knowledge Distillation for UAV-based Object
  Detection
Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection
Liang Yao
Fan Liu
Chuanyi Zhang
Zhiquan Ou
Ting Wu
VLM
116
5
0
21 Aug 2024
Enhancing One-shot Pruned Pre-trained Language Models through
  Sparse-Dense-Sparse Mechanism
Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
Guanchen Li
Xiandong Zhao
Lian Liu
Zeping Li
Dong Li
Lu Tian
Jie He
Ashish Sirasao
E. Barsoum
VLM
54
1
0
20 Aug 2024
Diffusion Model for Planning: A Systematic Literature Review
Diffusion Model for Planning: A Systematic Literature Review
Toshihide Ubukata
Jialong Li
Kenji Tei
DiffMMedIm
142
9
0
16 Aug 2024
An Effective Information Theoretic Framework for Channel Pruning
An Effective Information Theoretic Framework for Channel Pruning
Yihao Chen
Zefang Wang
86
3
0
14 Aug 2024
Infra-YOLO: Efficient Neural Network Structure with Model Compression
  for Real-Time Infrared Small Object Detection
Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection
Zhonglin Chen
Anyu Geng
Jianan Jiang
Jiwu Lu
Di Wu
ObjD
49
0
0
14 Aug 2024
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Tianyu Liu
Yun Li
Qitan Lv
Kai Liu
Jianchen Zhu
Winston Hu
Xingwu Sun
142
20
0
13 Aug 2024
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models
  On Heterogeneous Microcontrollers
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers
Moritz Scherer
Luka Macan
Victor J. B. Jung
Philip Wiese
Luca Bompani
Luca Bompani
Francesco Conti
Luca Benini
MoE
80
12
0
08 Aug 2024
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model
Mingcan Xiang
Steven Jiaxun Tang
Qizheng Yang
Hui Guan
Tongping Liu
VLM
77
1
0
07 Aug 2024
Speaker Adaptation for Quantised End-to-End ASR Models
Speaker Adaptation for Quantised End-to-End ASR Models
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
78
1
0
07 Aug 2024
Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks
Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks
Jaewook Lee
Yoel Park
Seulki Lee
VLM
61
1
0
07 Aug 2024
Compress and Compare: Interactively Evaluating Efficiency and Behavior
  Across ML Model Compression Experiments
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments
Angie Boggust
Venkatesh Sivaraman
Yannick Assogba
Donghao Ren
Dominik Moritz
Fred Hohman
VLM
87
3
0
06 Aug 2024
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong
Lujun Li
Dayou Du
Yuhan Chen
Zhenheng Tang
...
Wei Xue
Wenhan Luo
Qi-fei Liu
Yi-Ting Guo
Xiaowen Chu
MQ
91
10
0
03 Aug 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Róisín Luo
Alexandru Drimbarean
Walsh Simon
Colm O'Riordan
MQ
89
1
0
01 Aug 2024
Pruning Large Language Models with Semi-Structural Adaptive Sparse
  Training
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang
Yuezhou Hu
Guohao Jian
Jun Zhu
Jianfei Chen
107
8
0
30 Jul 2024
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs
Seungmin Yu
Xiaodie Yi
Hayun Lee
Dongkun Shin
74
1
0
30 Jul 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
129
2
0
29 Jul 2024
Parameter-Efficient Fine-Tuning via Circular Convolution
Parameter-Efficient Fine-Tuning via Circular Convolution
Aochuan Chen
Jiashun Cheng
Zijing Liu
Ziqi Gao
Fugee Tsung
Yu-Feng Li
Jia Li
151
3
0
27 Jul 2024
Greedy Output Approximation: Towards Efficient Structured Pruning for
  LLMs Without Retraining
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li
Yijun Dong
Qi Lei
108
6
0
26 Jul 2024
Efficient Inference of Vision Instruction-Following Models with Elastic
  Cache
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Zuyan Liu
Benlin Liu
Jiahui Wang
Yuhao Dong
Guangyi Chen
Yongming Rao
Ranjay Krishna
Jiwen Lu
VLM
89
14
0
25 Jul 2024
Accurate and Efficient Fine-Tuning of Quantized Large Language Models
  Through Optimal Balance
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance
Ao Shen
Qiang Wang
Zhiquan Lai
Xionglve Li
Dongsheng Li
ALMMQ
64
1
0
24 Jul 2024
Accelerating the Low-Rank Decomposed Models
Accelerating the Low-Rank Decomposed Models
Habib Hajimolahoseini
Walid Ahmed
Austin Wen
Yang Liu
91
0
0
24 Jul 2024
MetaAug: Meta-Data Augmentation for Post-Training Quantization
MetaAug: Meta-Data Augmentation for Post-Training Quantization
Cuong Pham
Hoang Anh Dung
Cuong C. Nguyen
Trung Le
Dinh Q. Phung
Gustavo Carneiro
Thanh-Toan Do
MQ
83
0
0
20 Jul 2024
Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
Ruizi Han
Jinglei Tang
106
1
0
19 Jul 2024
Previous
123456...686970
Next