ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,431 papers shown
Title
V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes
V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes
Yanming Zhang
Jun-Kun Chen
Jipeng Lyu
Yu-Xiong Wang
DiffM
VGen
53
0
0
13 Mar 2025
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
Chenpeng Wu
Qiqi Gu
Heng Shi
Jianguo Yao
Haibing Guan
MoE
53
0
0
13 Mar 2025
Autoregressive Image Generation with Randomized Parallel Decoding
Haopeng Li
Jinyue Yang
Guoqi Li
Huan Wang
55
0
0
13 Mar 2025
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Zhengyao Lv
Chenyang Si
Junhao Song
Zhenyu Yang
Yu Qiao
Ziwei Liu
Kwan-Yee K. Wong
VGen
DiffM
84
8
0
13 Mar 2025
Beyond Atoms: Enhancing Molecular Pretrained Representations with 3D Space Modeling
Beyond Atoms: Enhancing Molecular Pretrained Representations with 3D Space Modeling
Shuqi Lu
Xiaohong Ji
Bohang Zhang
Lin Yao
Siyuan Liu
Zhifeng Gao
Linfeng Zhang
Guolin Ke
AI4CE
46
1
0
13 Mar 2025
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Marianne Arriola
Aaron Gokaslan
Justin T Chiu
Zhihan Yang
Zhixuan Qi
Jiaqi Han
S. Sahoo
Volodymyr Kuleshov
DiffM
75
5
0
12 Mar 2025
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
Haoyu Zhang
Qiaohui Chu
Meng Liu
Yunxiao Wang
Bin Wen
Fan Yang
Tingting Gao
Di Zhang
Yaowei Wang
Liqiang Nie
EgoV
75
0
0
12 Mar 2025
RigoChat 2: an adapted language model to Spanish using a bounded dataset and reduced hardware
Gonzalo Santamaría Gómez
Guillem García Subies
Pablo Gutiérrez Ruiz
Mario González Valero
Natàlia Fuertes
...
Nuria Aldama García
David Betancur Sánchez
Kateryna Sushkova
Marta Guerrero Nieto
Á. Jiménez
51
0
0
11 Mar 2025
AI-native Memory 2.0: Second Me
Jiale Wei
Xiang Ying
Tao Gao
Fangyi Bao
Felix Tao
Jingbo Shang
59
1
0
11 Mar 2025
Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
I. Cho
Youngbeom Yoo
Subin Jeon
Seon Joo Kim
DiffM
62
0
0
11 Mar 2025
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
Pol G. Recasens
Ferran Agullo
Yue Zhu
Chen Wang
Eun Kyung Lee
Olivier Tardieu
Jordi Torres
Josep Ll. Berral
48
0
0
11 Mar 2025
LiSu: A Dataset and Method for LiDAR Surface Normal Estimation
Dušan Malić
Christian Fruhwirth-Reisinger
Samuel Schulter
Horst Possegger
3DV
57
0
0
11 Mar 2025
Queueing, Predictions, and LLMs: Challenges and Open Problems
Michael Mitzenmacher
Rana Shahout
AI4TS
LRM
41
1
0
10 Mar 2025
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Jiacheng Liu
Chang Zou
Yuanhuiyi Lyu
Junjie Chen
Linfeng Zhang
DiffM
63
1
0
10 Mar 2025
Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation
Junhao Zhang
Richong Zhang
Fanshuang Kong
Ziyang Miao
Yanhan Ye
Yaowei Zheng
SyDa
46
0
0
10 Mar 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo
Yingying Zhang
Xiaoyu Yang
Kang Wu
Qi Zhu
Lei Liang
Jingdong Chen
Yansheng Li
67
1
0
10 Mar 2025
Slim attention: cut your context memory in half without loss of accuracy -- K-cache is all you need for MHA
Nils Graef
Andrew Wasielewski
40
1
0
07 Mar 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Siyang Song
Mohammed Irfan Kurpath
Sahal Shaji Mullappilly
Jean Lahoud
Fahad A Khan
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
AuLLM
150
0
0
06 Mar 2025
Shifting Long-Context LLMs Research from Input to Output
Yuhao Wu
Yushi Bai
Zhiqing Hu
Shangqing Tu
Ming Shan Hee
Juanzi Li
Roy Ka-Wei Lee
65
0
0
06 Mar 2025
DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module
Krish Sharma
Niyar R. Barman
Nicholas M. Asher
Akshay Chaturvedi
LRM
AIMat
69
7
0
06 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Alireza Behtash
Marijan Fofonjka
Ethan Baird
Tyler Mauer
Hossein Moghimifam
David Stout
Joel Dennison
MQ
58
1
0
06 Mar 2025
L2^22M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen
Oriol Mayné i Comas
Zhuotao Jin
Di Luo
Marin Soljacic
67
0
0
06 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
77
0
0
04 Mar 2025
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Yujiao Yang
Jing Lian
Linhui Li
MoE
82
0
0
04 Mar 2025
Optimizing open-domain question answering with graph-based retrieval augmented generation
Joyce Cahoon
Prerna Singh
Nick Litombe
Jonathan Larson
Ha Trinh
Yiwen Zhu
A. Mueller
Fotis Psallidas
Carlo Curino
34
0
0
04 Mar 2025
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference
Hongchao Du
Shangyu Wu
Arina Kharlamova
Nan Guan
Chun Jason Xue
51
1
0
04 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
183
0
0
03 Mar 2025
Alchemist: Towards the Design of Efficient Online Continual Learning System
Yuyang Huang
Yuhan Liu
Haryadi S. Gunawi
Beibin Li
Changho Hwang
CLL
OnRL
101
0
0
03 Mar 2025
Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)
Jingtian Yan
Zhifei Li
William Kang
Yulun Zhang
Stephen Smith
Jiaoyang Li
48
0
0
03 Mar 2025
Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems
Ben Bucknall
Robert F. Trager
Michael A. Osborne
80
0
0
03 Mar 2025
Structural Deep Encoding for Table Question Answering
Raphael Mouravieff
Benjamin Piwowarski
Sylvain Lamprier
LMTD
49
0
0
03 Mar 2025
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
35
0
0
03 Mar 2025
Flow Matching for Medical Image Synthesis: Bridging the Gap Between Speed and Quality
M. Yazdani
Yasamin Medghalchi
Pooria Ashrafian
I. Hacihaliloglu
Dena Shahriari
MedIm
37
0
0
01 Mar 2025
Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving
Qihui Zhou
Peiqi Yin
Pengfei Zuo
James Cheng
CLL
40
1
0
01 Mar 2025
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Yifei Xia
Suhan Ling
Fangcheng Fu
Y. Wang
Huixia Li
Xuefeng Xiao
Tengjiao Wang
VGen
65
2
0
28 Feb 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
71
5
0
28 Feb 2025
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
Bernard Ghanem
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
41
0
0
28 Feb 2025
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs
Hao Ge
Junda Feng
Qi Huang
Fangcheng Fu
Xiaonan Nie
Lei Zuo
Yanghua Peng
Tengjiao Wang
Xin Liu
47
2
0
28 Feb 2025
Implicit Search via Discrete Diffusion: A Study on Chess
Implicit Search via Discrete Diffusion: A Study on Chess
Jiacheng Ye
Zhenyu Wu
Jiahui Gao
Zhiyong Wu
Xin Jiang
Z. Li
Lingpeng Kong
DiffM
50
2
0
27 Feb 2025
Training LLMs with MXFP4
Training LLMs with MXFP4
Albert Tseng
Tao Yu
Youngsuk Park
34
1
0
27 Feb 2025
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Keisuke Kamahori
Jungo Kasai
Noriyuki Kojima
Baris Kasikci
34
0
0
27 Feb 2025
HDEE: Heterogeneous Domain Expert Ensemble
HDEE: Heterogeneous Domain Expert Ensemble
Oğuzhan Ersoy
Jari Kolehmainen
Gabriel Passamani Andrade
MoE
45
0
0
26 Feb 2025
Self-Adjust Softmax
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Z. Li
Yu-Hu Li
50
0
0
25 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
72
0
0
24 Feb 2025
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Yepeng Weng
Dianwen Mei
Huishi Qiu
Xujie Chen
Li Liu
Jiang Tian
Zhongchao Shi
53
0
0
24 Feb 2025
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms
Feiyang Chen
Yu Cheng
Lei Wang
Yuqing Xia
Ziming Miao
...
Fan Yang
Jinbao Xue
Zhi Yang
M. Yang
H. Chen
81
1
0
24 Feb 2025
Selective Prompt Anchoring for Code Generation
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
94
3
0
24 Feb 2025
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving
Zeyu Yang
Nan Song
Wei Li
Xiatian Zhu
L. Zhang
Philip H. S. Torr
79
4
0
24 Feb 2025
Training a Generally Curious Agent
Training a Generally Curious Agent
Fahim Tajwar
Yiding Jiang
Abitha Thankaraj
Sumaita Sadia Rahman
J. Zico Kolter
Jeff Schneider
Ruslan Salakhutdinov
120
1
0
24 Feb 2025
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Zhongwei Wan
H. Shen
Xin Wang
Junfeng Fang
Zheda Mai
M. Zhang
VLM
65
3
0
24 Feb 2025
Previous
12345...272829
Next