Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.01384
Cited By
On the Compressibility of Quantized Large Language Models
3 March 2024
Yu Mao
Weilan Wang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Compressibility of Quantized Large Language Models"
7 / 7 papers shown
Title
Easz: An Agile Transformer-based Image Compression Framework for Resource-constrained IoTs
Yu Mao
Jingzong Li
Jun Wang
Hong Xu
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
24
0
0
03 May 2025
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
43
1
0
08 Jun 2024
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
72
112
0
12 Dec 2023
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Longteng Zhang
Xiang Liu
Zeyu Li
Xinglin Pan
Peijie Dong
...
Rui Guo
Xin Wang
Qiong Luo
S. Shi
Xiaowen Chu
41
7
0
07 Nov 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
414
0
18 Jan 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
243
4,469
0
23 Jan 2020
1