Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.16663
Cited By
FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs
22 October 2024
Haoran Lin
Xianzhi Yu
Kang Zhao
Lu Hou
Zongyuan Zhan
Stanislav Kamenev
Han Bao
Ting Hu
Mingkai Wang
Qixin Chang
Siyue Sui
Weihao Sun
Jiaxin Hu
Jun Yao
Zekun Yin
Cheng Qian
Ying Zhang
Yinfei Pan
Yu Yang
Weiguo Liu
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs"
1 / 1 papers shown
Title
Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels
Mingcong Song
Xinru Tang
Fengfan Hou
Jing Li
Wei Wei
...
Hongjie Si
D. Jiang
Shouyi Yin
Yang Hu
Guoping Long
36
1
0
24 Dec 2024
1