Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.06219
Cited By
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
10 May 2024
Haojie Duanmu
Zhihang Yuan
Xiuhong Li
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models"
5 / 5 papers shown
Title
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
Wen Liu
Jun Yao
MQ
96
4
0
12 Oct 2024
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Wenqi Shao
Mengzhao Chen
Zhaoyang Zhang
Peng Xu
Lirui Zhao
Zhiqiang Li
Kaipeng Zhang
Peng Gao
Yu Qiao
Ping Luo
MQ
48
185
0
25 Aug 2023
H
2
_2
2
O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
103
275
0
24 Jun 2023
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
57
150
0
27 Sep 2022
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
79
3,996
0
10 Apr 2020
1