Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1510.00149
Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1 October 2015
Song Han
Huizi Mao
W. Dally
3DGS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
50 / 3,446 papers shown
Title
DeFTX: Denoised Sparse Fine-Tuning for Zero-Shot Cross-Lingual Transfer
Sona Elza Simon
Preethi Jyothi
VLM
14
0
0
21 May 2025
Refining Neural Activation Patterns for Layer-Level Concept Discovery in Neural Network-Based Receivers
Marko Tuononen
Duy Vu
Dani Korpi
Vesa Starck
Ville Hautamäki
12
0
0
21 May 2025
Optimal Client Sampling in Federated Learning with Client-Level Heterogeneous Differential Privacy
Jiahao Xu
Rui Hu
Olivera Kotevska
FedML
26
0
0
19 May 2025
An Overview of Arithmetic Adaptations for Inference of Convolutional Neural Networks on Re-configurable Hardware
Ilkay Wunderlich
Benjamin Koch
Sven Schönfeld
19
2
0
19 May 2025
HarmonE: A Self-Adaptive Approach to Architecting Sustainable MLOps
Hiya Bhatt
Shaunak Biswas
Srinivasan Rakhunathan
Karthik Vaidhyanathan
AI4CE
17
0
0
19 May 2025
Automatic Complementary Separation Pruning Toward Lightweight CNNs
David Levin
Gonen Singer
12
0
0
19 May 2025
QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding
Subrata Biswas
Mohammad Nur Hossain Khan
Bashima Islam
7
0
0
19 May 2025
InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models
Zhen Li
Yupeng Su
Songmiao Wang
Runming Yang
C. Xie
...
Ming Li
Jiannong Cao
Yuan Xie
Ngai Wong
Hongxia Yang
MQ
12
0
0
16 May 2025
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
Mamba
54
0
0
13 May 2025
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
56
0
0
13 May 2025
Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry
Mohammed Adnan
Rohan Jain
Ekansh Sharma
Rahul Krishnan
Yani Andrew Ioannou
61
0
0
08 May 2025
PROM: Prioritize Reduction of Multiplications Over Lower Bit-Widths for Efficient CNNs
Lukas Meiner
Jens Mehnert
Alexandru Paul Condurache
MQ
42
0
0
06 May 2025
Efficient Continual Learning in Keyword Spotting using Binary Neural Networks
Quynh Nguyen Phuong Vu
Luciano S. Martinez-Rau
Yuxuan Zhang
Nho-Duc Tran
Bengt Oelmann
Michele Magno
Sebastian Bader
CLL
45
0
0
05 May 2025
FPGA-based Acceleration for Convolutional Neural Networks: A Comprehensive Review
Junye Jiang
Yaan Zhou
Yuanhao Gong
Haoxuan Yuan
Shuanglong Liu
2
0
0
04 May 2025
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Chuan Sun
Han Yu
Lizhen Cui
Xiaoxiao Li
181
0
0
03 May 2025
HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models
J. Zhang
Rongxiang Weng
Yiming Li
Lidan Shou
Ke Chen
Gang Chen
Qin Xie
Guiming Xie
Xuejian Gong
33
0
0
24 Apr 2025
BackSlash: Rate Constrained Optimized Training of Large Language Models
Jun Wu
Jiangtao Wen
Yuxing Han
39
0
0
23 Apr 2025
Efficient Adaptation of Deep Neural Networks for Semantic Segmentation in Space Applications
Leonardo Olivi
Edoardo Santero Mormile
Enzo Tartaglione
SSeg
35
0
0
22 Apr 2025
Mathematical Programming Models for Exact and Interpretable Formulation of Neural Networks
Masoud Ataei
Edrin Hasaj
Jacob Gipp
Sepideh Forouzi
29
0
0
19 Apr 2025
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts
Leyang Li
Shilin Lu
Yan Ren
A. Kong
DiffM
56
1
0
17 Apr 2025
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Chaoyue Niu
Yucheng Ding
Junhui Lu
Zhengxiang Huang
Hang Zeng
Yutong Dai
Xuezhen Tu
Chengfei Lv
Fan Wu
Guihai Chen
35
1
0
17 Apr 2025
ConvShareViT: Enhancing Vision Transformers with Convolutional Attention Mechanisms for Free-Space Optical Accelerators
Riad Ibadulla
Thomas M. Chen
C. Reyes-Aldasoro
ViT
34
0
0
15 Apr 2025
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Tianyi Zhang
Yang Sui
Shaochen Zhong
V. Chaudhary
Xia Hu
Anshumali Shrivastava
MQ
32
1
0
15 Apr 2025
Mamba-Based Ensemble learning for White Blood Cell Classification
Lewis Clifton
X. Tian
D. Palasuwan
Phandee Watanaboonyongcharoen
Ponlapat Rojnuckarin
Nantheera Anantrasirichai
Mamba
51
0
0
15 Apr 2025
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
199
5
0
15 Apr 2025
CUT: Pruning Pre-Trained Multi-Task Models into Compact Models for Edge Devices
Jingxuan Zhou
Weidong Bao
Ji Wang
Zhengyi Zhong
32
0
0
14 Apr 2025
Tin-Tin: Towards Tiny Learning on Tiny Devices with Integer-based Neural Network Training
Yi Hu
Jinhang Zuo
Eddie Zhang
Bob Iannucci
Carlee Joe-Wong
37
0
0
13 Apr 2025
Can LLMs Revolutionize the Design of Explainable and Efficient TinyML Models?
Christophe El Zeinaty
W. Hamidouche
Glenn Herrou
D. Ménard
Merouane Debbah
43
0
0
13 Apr 2025
Cycle Training with Semi-Supervised Domain Adaptation: Bridging Accuracy and Efficiency for Real-Time Mobile Scene Detection
Huu-Phong Phan-Nguyen
Anh Dao
T. Nguyen
Tuan Quang
H. Tran
Tinh-Anh Nguyen-Nhu
Huy-Thach Pham
Quan Nguyen
Hoang M. Le
Quang-Vinh Dinh
41
0
0
12 Apr 2025
Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models
Yoojin Jung
Byung Cheol Song
AAML
VLM
MQ
41
0
0
07 Apr 2025
Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights
Tahniat Khan
Soroor Motie
Sedef Akinli Kocak
Shaina Raza
MQ
44
0
0
07 Apr 2025
Hyperflows: Pruning Reveals the Importance of Weights
Eugen Barbulescu
Antonio Alexoaie
36
0
0
06 Apr 2025
Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
Vishnu Kabir Chhabra
Mohammad Mahdi Khalili
AI4CE
33
0
0
05 Apr 2025
Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning
Sanghwan Bae
Jiwoo Hong
Min Young Lee
Hanbyul Kim
Jeongyeon Nam
Donghyun Kwak
OffRL
LRM
58
4
0
04 Apr 2025
HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse
Yuwei An
Yihua Cheng
Seo Jin Park
Junchen Jiang
50
1
0
03 Apr 2025
MDP: Multidimensional Vision Model Pruning with Latency Constraint
Xinglong Sun
Barath Lakshmanan
Maying Shen
Shiyi Lan
Jingde Chen
Jose M. Alvarez
VLM
62
0
0
02 Apr 2025
FedPaI: Achieving Extreme Sparsity in Federated Learning via Pruning at Initialization
Haonan Wang
Zichen Liu
Kajimusugura Hoshino
Tuo Zhang
J. Walters
S. Crago
49
0
0
01 Apr 2025
Machine Learning-assisted High-speed Combinatorial Optimization with Ising Machines for Dynamically Changing Problems
Yohei Hamakawa
Tomoya Kashimata
Masaya Yamasaki
Kosuke Tatsumura
AI4CE
42
0
0
31 Mar 2025
Optimization of Layer Skipping and Frequency Scaling for Convolutional Neural Networks under Latency Constraint
Minh David Thao Chan
Ruoyu Zhao
Yukuan Jia
Ruiqing Mao
Sheng Zhou
51
0
0
31 Mar 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Abdelrahman M. Shaker
Muhammad Maaz
Chenhui Gou
Hamid Rezatofighi
Salman Khan
Fahad Shahbaz Khan
225
0
0
27 Mar 2025
Optimizing Multi-DNN Inference on Mobile Devices through Heterogeneous Processor Co-Execution
Yunquan Gao
Zhiguo Zhang
Praveen Kumar Donta
C. Dehury
Xinbing Wang
Dusit Niyato
Qiyang Zhang
46
0
0
27 Mar 2025
An Efficient Training Algorithm for Models with Block-wise Sparsity
Ding Zhu
Zhiqun Zuo
Mohammad Mahdi Khalili
42
0
0
27 Mar 2025
Boosting Large Language Models with Mask Fine-Tuning
M. Zhang
Yue Bai
Huan Wang
Yizhou Wang
Qihua Dong
Y. Fu
CLL
58
0
0
27 Mar 2025
Lipschitz Constant Meets Condition Number: Learning Robust and Compact Deep Neural Networks
Yangqi Feng
S. J. Lin
Baoyuan Gao
Xian Wei
AAML
78
0
0
26 Mar 2025
A Low-complexity Structured Neural Network Approach to Intelligently Realize Wideband Multi-beam Beamformers
Hansaka Aluvihare
Sivakumar Sivasankar
Xianqi Li
Arjuna Madanayake
Sirani M. Perera
83
0
0
26 Mar 2025
GIViC: Generative Implicit Video Compression
Ge Gao
Siyue Teng
Tianhao Peng
Fan Zhang
David Bull
DiffM
VGen
50
0
0
25 Mar 2025
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
Xu Han
Yuan Tang
Jinfeng Xu
Xianzhi Li
53
0
0
24 Mar 2025
Temporal Action Detection Model Compression by Progressive Block Drop
Xiaoyong Chen
Yong Guo
Jiaming Liang
Sitong Zhuang
Runhao Zeng
Xiping Hu
55
0
0
21 Mar 2025
Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing
Vishnu Asutosh Dasu
Md. Rafi Ur Rashid
Vipul Gupta
Saeid Tizpaz-Niari
Gang Tan
AAML
54
0
0
20 Mar 2025
PARQ: Piecewise-Affine Regularized Quantization
Lisa Jin
Jianhao Ma
Zechun Liu
Andrey Gromov
Aaron Defazio
Lin Xiao
MQ
43
0
0
19 Mar 2025
1
2
3
4
...
67
68
69
Next