ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.18773
  4. Cited By
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache

24 March 2025
Dayou Du
Shijie Cao
Jianyi Cheng
Ting Cao
M. Yang
    MQ
ArXiv (abs)PDFHTML

Papers citing "BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache"

4 / 4 papers shown
Title
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Han Guo
William Brandon
Radostin Cholakov
Jonathan Ragan-Kelley
Eric P. Xing
Yoon Kim
MQ
169
16
0
20 Jan 2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Zihao Ye
Lequn Chen
Ruihang Lai
Wuwei Lin
Yineng Zhang
...
Tianqi Chen
Baris Kasikci
Vinod Grover
Arvind Krishnamurthy
Luis Ceze
142
35
0
02 Jan 2025
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Chengyue Wu
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
172
98
0
07 May 2024
Fast Transformer Decoding: One Write-Head is All You Need
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
170
478
0
06 Nov 2019
1