Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.11918
Cited By
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library
19 December 2023
Ganesh Bikshandi
Jay Shah
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library"
3 / 3 papers shown
Title
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
90
19
0
03 Oct 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
73
8
0
29 Jul 2024
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
68
26
0
07 May 2024
1