ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.11651
  4. Cited By
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

15 April 2025
Tianyi Zhang
Yang Sui
Shaochen Zhong
Vipin Chaudhary
Helen Zhou
Anshumali Shrivastava
    MQ
ArXiv (abs)PDFHTML

Papers citing "70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float"

13 / 13 papers shown
Title
Get Experience from Practice: LLM Agents with Record & Replay
Get Experience from Practice: LLM Agents with Record & Replay
Erhu Feng
Wenbo Zhou
Zibin Liu
Le Chen
Yunpeng Dong
...
Yisheng Zhao
Dong Du
Zhichao Hua
Yubin Xia
Haibo Chen
104
0
0
23 May 2025
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu
Yuxuan Sun
Manyi Zhang
Haoli Bai
Xianzhi Yu
Tiezheng Yu
C. Yuan
Lu Hou
MQLRM
111
11
0
07 Apr 2025
Gemma 3 Technical Report
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
189
136
0
25 Mar 2025
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
Patrick Yubeaton
Tareq Mahmoud
Shehab Naga
Pooria Taheri
Tianhua Xia
...
Yasmein Khalil
Sai Qian Zhang
Siddharth Joshi
Chinmay Hegde
Siddharth Garg
MQ
59
2
0
02 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
380
2,000
0
22 Jan 2025
LLMCBench: Benchmarking Large Language Model Compression for Efficient
  Deployment
LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment
Ge Yang
Changyi He
Jinpei Guo
Jianyu Wu
Yifu Ding
Aishan Liu
Haotong Qin
Pengliang Ji
Xianglong Liu
MQ
77
7
0
28 Oct 2024
NF4 Isn't Information Theoretically Optimal (and that's Good)
NF4 Isn't Information Theoretically Optimal (and that's Good)
Davis Yoshida
MQ
68
10
0
12 Jun 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDLMQ
106
578
0
01 Jun 2023
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
  Transformers
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
152
1,008
0
31 Oct 2022
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
503
20,376
0
23 Oct 2019
A Study of BFLOAT16 for Deep Learning Training
A Study of BFLOAT16 for Deep Learning Training
Dhiraj D. Kalamkar
Dheevatsa Mudigere
Naveen Mellempudi
Dipankar Das
K. Banerjee
...
Sudarshan Srinivasan
Abhisek Kundu
M. Smelyanskiy
Bharat Kaul
Pradeep Dubey
MQ
92
347
0
29 May 2019
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
349
2,900
0
26 Sep 2016
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
263
8,864
0
01 Oct 2015
1