Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.18773
Cited By
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache
24 March 2025
Dayou Du
Shijie Cao
Jianyi Cheng
Ting Cao
M. Yang
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache"
4 / 4 papers shown
Title
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Han Guo
William Brandon
Radostin Cholakov
Jonathan Ragan-Kelley
Eric P. Xing
Yoon Kim
MQ
169
16
0
20 Jan 2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Zihao Ye
Lequn Chen
Ruihang Lai
Wuwei Lin
Yineng Zhang
...
Tianqi Chen
Baris Kasikci
Vinod Grover
Arvind Krishnamurthy
Luis Ceze
142
35
0
02 Jan 2025
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Chengyue Wu
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
172
98
0
07 May 2024
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
170
478
0
06 Nov 2019
1