ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.01005
  4. Cited By
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
v1v2 (latest)

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

2 January 2025
Zihao Ye
Lequn Chen
Ruihang Lai
Wuwei Lin
Yineng Zhang
Stephanie Wang
Tianqi Chen
Baris Kasikci
Vinod Grover
Arvind Krishnamurthy
Luis Ceze
ArXiv (abs)PDFHTML

Papers citing "FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving"

9 / 59 papers shown
Title
Understanding GNN Computational Graph: A Coordinated Computation, IO,
  and Memory Perspective
Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective
Hengrui Zhang
Zhongming Yu
Guohao Dai
Guyue Huang
Yufei Ding
Yuan Xie
Yu Wang
GNN
66
48
0
18 Oct 2021
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
355
780
0
27 Aug 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
415
2,572
0
20 Apr 2021
FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph
  Neural Networks
FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks
Md. Khaledur Rahman
Majedul Haque Sujon
A. Azad
FedMLGNN
79
51
0
07 Nov 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALMVLM
300
4,120
0
10 Apr 2020
Fast Transformer Decoding: One Write-Head is All You Need
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
190
493
0
06 Nov 2019
Online normalizer calculation for softmax
Online normalizer calculation for softmax
Maxim Milakov
N. Gimelshein
135
95
0
08 May 2018
Block-Sparse Recurrent Neural Networks
Block-Sparse Recurrent Neural Networks
Sharan Narang
Eric Undersander
G. Diamos
69
139
0
08 Nov 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
1.2K
133,803
0
12 Jun 2017
Previous
12