ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08691
  4. Cited By
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

17 July 2023
Tri Dao
    LRM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning"

50 / 329 papers shown
Title
LBMamba: Locally Bi-directional Mamba
LBMamba: Locally Bi-directional Mamba
Jingwei Zhang
Xi Han
Hong Qin
Mahdi S. Hosseini
Dimitris Samaras
Mamba
42
0
0
19 Jun 2025
REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing
REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing
Kangqi Chen
Andreas Kosmas Kakolyris
Rakesh Nadig
Manos Frouzakis
Nika Mansouri-Ghiasi
Yu Liang
Haiyu Mao
Jisung Park
Mohammad Sadrosadati
Onur Mutlu
RALM
38
0
0
19 Jun 2025
Reranking-based Generation for Unbiased Perspective Summarization
Reranking-based Generation for Unbiased Perspective Summarization
Narutatsu Ri
Nicholas Deas
Kathleen McKeown
OffRL
19
0
0
19 Jun 2025
Early Attentive Sparsification Accelerates Neural Speech Transcription
Early Attentive Sparsification Accelerates Neural Speech Transcription
Zifei Xu
Sayeh Sharify
Hesham Mostafa
T. Webb
W. Yazar
Xin Wang
17
0
0
18 Jun 2025
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Minsoo Kim
Kyuhong Shim
Jungwook Choi
Simyung Chang
VLM
12
0
0
18 Jun 2025
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
Xiaoran Liu
Zhigeng Liu
Zengfeng Huang
Qipeng Guo
Ziwei He
Xipeng Qiu
41
0
0
17 Jun 2025
Multipole Attention for Efficient Long Context Reasoning
Multipole Attention for Efficient Long Context Reasoning
Coleman Hooper
Sebastian Zhao
Luca Manolache
Sehoon Kim
Michael W. Mahoney
Y. Shao
Kurt Keutzer
Amir Gholami
OffRLLRM
21
0
0
16 Jun 2025
Personalizable Long-Context Symbolic Music Infilling with MIDI-RWKV
Personalizable Long-Context Symbolic Music Infilling with MIDI-RWKV
Christian Zhou-Zheng
Philippe Pasquier
22
0
0
16 Jun 2025
Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
Tuan Nguyen
Long-Vu Hoang
Huy-Dat Tran
24
0
0
16 Jun 2025
GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models
GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models
Ruiguang Pei
W. Sun
Zhihui Fu
Jun Wang
VLM
17
0
0
16 Jun 2025
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
Qirui Zhou
Shaohui Peng
Weiqiang Xiong
Haixin Chen
Yuanbo Wen
...
Ke Gao
Ruizhi Chen
Yanjun Wu
Chen Zhao
Y. Chen
LRM
27
0
0
14 Jun 2025
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Hui Wei
Dong Yoon Lee
Shubham Rohal
Zhizhang Hu
Shiwei Fang
Shijia Pan
40
0
0
13 Jun 2025
ViSAGe: Video-to-Spatial Audio Generation
ViSAGe: Video-to-Spatial Audio Generation
Jaeyeon Kim
Heeseung Yun
Gunhee Kim
VGen
35
2
0
13 Jun 2025
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Shuai Wang
Zhenhua Liu
Jiaheng Wei
Xuanwu Yin
Dong Li
E. Barsoum
LRM
80
0
0
11 Jun 2025
GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO
GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO
Yiyang Zhao
Huiyu Bai
Xuejiao Zhao
OffRL
29
0
0
10 Jun 2025
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
Zike Wu
Qi Yan
Xuanyu Yi
Lele Wang
Renjie Liao
3DGS
26
0
0
10 Jun 2025
Can A Gamer Train A Mathematical Reasoning Model?
Andrew Shin
ReLMLRM
34
0
0
10 Jun 2025
Unlocking Recursive Thinking of LLMs: Alignment via Refinement
Unlocking Recursive Thinking of LLMs: Alignment via Refinement
Haoke Zhang
Xiaobo Liang
Cunxiang Wang
Juntao Li
Min Zhang
LRM
45
0
0
06 Jun 2025
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration
Hanzhi Zhang
Heng Fan
Kewei Sha
Yan Huang
Yunhe Feng
22
0
0
06 Jun 2025
Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning
Zhiyuan Ma
Jiayu Liu
Xianzhen Luo
Zhenya Huang
Qingfu Zhu
Wanxiang Che
LLMAG
173
0
0
05 Jun 2025
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu
Zeyu Zhang
Zhexin Li
Xuehai Bai
Yizeng Han
...
Jiahao He
Yuanyu He
F. Wang
Gholamreza Haffari
Bohan Zhuang
VGenMQ
144
1
0
05 Jun 2025
Log-Linear Attention
Log-Linear Attention
Han Guo
Songlin Yang
Tarushii Goel
Eric P. Xing
Tri Dao
Yoon Kim
Mamba
162
1
0
05 Jun 2025
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
J. Oswald
Nino Scherrer
Seijin Kobayashi
Luca Versari
Songlin Yang
...
Guillaume Lajoie
Charlotte Frenkel
Razvan Pascanu
Blaise Agüera y Arcas
João Sacramento
102
1
0
05 Jun 2025
Kinetics: Rethinking Test-Time Scaling Laws
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan
Zhuoming Chen
Haizhong Zheng
Yang Zhou
Emma Strubell
Beidi Chen
114
0
0
05 Jun 2025
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers
Haosong Liu
Yuge Cheng
Zihan Liu
Aiyue Chen
Jing Lin
Yiwu Yao
Chen Chen
Jingwen Leng
Yu Feng
Minyi Guo
VGen
162
0
0
05 Jun 2025
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
Zhepei Wei
Wei-Lin Chen
Xinyu Zhu
Yu Meng
OffRL
112
0
0
04 Jun 2025
Learning to Insert [PAUSE] Tokens for Better Reasoning
Learning to Insert [PAUSE] Tokens for Better Reasoning
Eunki Kim
Sangryul Kim
James Thorne
LRM
50
0
0
04 Jun 2025
QKV Projections Require a Fraction of Their Memory
QKV Projections Require a Fraction of Their Memory
Malik Khalf
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQVLM
68
0
0
03 Jun 2025
Native-Resolution Image Synthesis
Native-Resolution Image Synthesis
Zidong Wang
Lei Bai
Xiangyu Yue
Wanli Ouyang
Yiyuan Zhang
74
0
0
03 Jun 2025
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Yan Gao
Massimo Roberto Scamarcia
Javier Fernandez-Marques
Mohammad Naseri
Chong Shen Ng
...
Junyan Wang
Zheyuan Liu
Daniel J. Beutel
Lingjuan Lyu
Nicholas D. Lane
ALM
56
1
0
03 Jun 2025
HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference
HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference
Ping Gong
Jiawei Yi
Shengnan Wang
Juncheng Zhang
Zewen Jin
...
Tong Yang
Gong Zhang
Renhai Chen
Feng Wu
Cheng Li
57
0
0
03 Jun 2025
InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba
InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba
Zizhao Wu
Yingying Sun
Yiming Chen
Xiaoling Gu
Ruyu Liu
Jiazhou Chen
Mamba
46
0
0
03 Jun 2025
Leveraging Natural Language Processing to Unravel the Mystery of Life: A Review of NLP Approaches in Genomics, Transcriptomics, and Proteomics
Leveraging Natural Language Processing to Unravel the Mystery of Life: A Review of NLP Approaches in Genomics, Transcriptomics, and Proteomics
Ella Rannon
David Burstein
AI4TS
23
0
0
02 Jun 2025
TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation
TransAct V2: Lifelong User Action Sequence Modeling on Pinterest Recommendation
Xue Xia
Saurabh Vishwas Joshi
Kousik Rajesh
Kangnan Li
Yangyi Lu
Nikil Pancha
Dhruvil Badani
Jiajing Xu
Pong Eksombatchai
33
0
0
02 Jun 2025
Self-Refining Language Model Anonymizers via Adversarial Distillation
Self-Refining Language Model Anonymizers via Adversarial Distillation
Kyuyoung Kim
Hyunjun Jeon
Jinwoo Shin
PILM
78
0
0
02 Jun 2025
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
Woomin Song
Sai Muralidhar Jayanthi
S. Ronanki
Kanthashree Mysore Sathyendra
Jinwoo Shin
Aram Galstyan
Shubham Katiyar
S. Bodapati
VLM
47
0
0
01 Jun 2025
SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling
SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling
Xiaodong Ji
Hailin Zhang
Fangcheng Fu
Bin Cui
31
0
0
30 May 2025
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Yuatyong Chaichana
Thanapat Trachu
Peerat Limkonchotiwat
Konpat Preechakul
Tirasan Khandhawit
Ekapol Chuangsuwanich
MoMe
76
0
0
29 May 2025
RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
Chong Zeng
Yue Dong
Pieter Peers
Hongzhi Wu
Xin Tong
33
0
0
28 May 2025
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse
Tianyu Guo
Hande Dong
Yichong Leng
Feng Liu
Cheater Lin
Nong Xiao
X. Zhang
RALM
22
0
0
28 May 2025
Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Donghyeon Joo
Helya Hosseini
Ramyad Hadidi
Bahar Asgari
74
0
0
28 May 2025
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
Ce Zhang
Kaixin Ma
Tianqing Fang
Wenhao Yu
Hongming Zhang
Zhisong Zhang
Yaqi Xie
Katia Sycara
Haitao Mi
Dong Yu
VLM
98
0
0
28 May 2025
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
Aniruddha Nrusimha
William Brandon
Mayank Mishra
Yikang Shen
Rameswar Panda
Jonathan Ragan-Kelley
Yoon Kim
VLM
22
0
0
28 May 2025
Hardware-Efficient Attention for Fast Decoding
Hardware-Efficient Attention for Fast Decoding
Ted Zadouri
Hubert Strauss
Tri Dao
77
2
0
27 May 2025
SageAttention2++: A More Efficient Implementation of SageAttention2
SageAttention2++: A More Efficient Implementation of SageAttention2
Jintao Zhang
Xiaoming Xu
Jia Wei
Haofeng Huang
Pengle Zhang
Chendong Xiang
Jun Zhu
Jianfei Chen
MQVLM
83
7
0
27 May 2025
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
Jungyoub Cha
Hyunjong Kim
Sungzoon Cho
VLM
80
0
0
27 May 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao
Keda Tao
Can Qin
Haoxuan You
Yang Sui
Huan Wang
VLM
65
0
0
27 May 2025
Accelerating Prefilling for Long-Context LLMs via Sparse Pattern Sharing
Accelerating Prefilling for Long-Context LLMs via Sparse Pattern Sharing
Dan Peng
Zhihui Fu
Zewen Ye
Zhuoran Song
Jun Wang
33
0
0
26 May 2025
Accelerating Nash Learning from Human Feedback via Mirror Prox
Accelerating Nash Learning from Human Feedback via Mirror Prox
D. Tiapkin
Daniele Calandriello
Denis Belomestny
Eric Moulines
Alexey Naumov
Kashif Rasul
Michal Valko
Pierre Ménard
56
0
0
26 May 2025
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
Dingyu Yao
Bowen Shen
Zheng Lin
Wei Liu
Jian Luan
Bin Wang
Weiping Wang
MQ
49
0
0
26 May 2025
1234567
Next