ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17694
  4. Cited By
FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding

FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding

23 May 2025
Zhibin Wang
Rui Ning
Chao Fang
Zhonghui Zhang
Xi Lin
Shaobo Ma
Mo Zhou
Xue Li
Zhongfeng Wang
Chengying Huan
Rong Gu
Kun Yang
Guihai Chen
Sheng Zhong
Chen Tian
ArXiv (abs)PDFHTML

Papers citing "FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding"

3 / 3 papers shown
Title
Context Parallelism for Scalable Million-Token Inference
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoELRM
170
7
0
04 Nov 2024
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Jiayi Yao
Hanchen Li
Yuhan Liu
Siddhant Ray
Yihua Cheng
Qizheng Zhang
Kuntai Du
Shan Lu
Junchen Jiang
125
24
0
26 May 2024
Fast Transformer Decoding: One Write-Head is All You Need
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
165
478
0
06 Nov 2019
1