ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.16663
  4. Cited By
FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

22 October 2024
Haoran Lin
Xianzhi Yu
Kang Zhao
Lu Hou
Zongyuan Zhan
Stanislav Kamenev
Han Bao
Ting Hu
Mingkai Wang
Qixin Chang
Siyue Sui
Weihao Sun
Jiaxin Hu
Jun Yao
Zekun Yin
Cheng Qian
Ying Zhang
Yinfei Pan
Yu Yang
Weiguo Liu
    LRM
ArXivPDFHTML

Papers citing "FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs"

1 / 1 papers shown
Title
Tackling the Dynamicity in a Production LLM Serving System with SOTA
  Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient
  Meta-kernels
Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels
Mingcong Song
Xinru Tang
Fengfan Hou
Jing Li
Wei Wei
...
Hongjie Si
D. Jiang
Shouyi Yin
Yang Hu
Guoping Long
36
1
0
24 Dec 2024
1