Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.17694
Cited By
FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
23 May 2025
Zhibin Wang
Rui Ning
Chao Fang
Zhonghui Zhang
Xi Lin
Shaobo Ma
Mo Zhou
Xue Li
Zhongfeng Wang
Chengying Huan
Rong Gu
Kun Yang
Guihai Chen
Sheng Zhong
Chen Tian
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding"
3 / 3 papers shown
Title
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoE
LRM
170
7
0
04 Nov 2024
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Jiayi Yao
Hanchen Li
Yuhan Liu
Siddhant Ray
Yihua Cheng
Qizheng Zhang
Kuntai Du
Shan Lu
Junchen Jiang
125
24
0
26 May 2024
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
165
478
0
06 Nov 2019
1