Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.02380
Cited By
v1
v2 (latest)
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
5 May 2025
Arnab Sanyal
Prithwish Mukherjee
Gourav Datta
Sandeep P. Chinchali
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices"
1 / 1 papers shown
Title
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
172
478
0
06 Nov 2019
1