BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache

24 March 2025

Papers citing "BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache"

4 / 4 papers shown

Title
Fast Matrix Multiplications for Lookup Table-Quantized LLMs Han Guo William Brandon Radostin Cholakov Jonathan Ragan-Kelley Eric P. Xing Yoon Kim MQ 169 16 0 20 Jan 2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye Lequn Chen Ruihang Lai Wuwei Lin Yineng Zhang ... Tianqi Chen Baris Kasikci Vinod Grover Arvind Krishnamurthy Luis Ceze 142 35 0 02 Jan 2025
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Chengyue Wu Haotian Tang Shang Yang Zhekai Zhang Guangxuan Xiao Chuang Gan Song Han 172 98 0 07 May 2024
Fast Transformer Decoding: One Write-Head is All You Need Noam M. Shazeer 170 478 0 06 Nov 2019