ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka
Taichi Iki
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
116
6
0
14 Apr 2025
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Abhinav Joshi
A. Ahmad
Divyaksh Shukla
Ashutosh Modi
ReLMLRM
84
0
0
14 Apr 2025
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference
Yangshen Deng
Zhengxin You
Long Xiang
Qilong Li
Peiqi Yuan
...
Man Lung Yiu
Huan Li
Qiaomu Shen
Rui Mao
Bo Tang
82
0
0
14 Apr 2025
Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
Wan Borui
Zhao Juntao
Jiang Chenyu
Guo Chuanxiong
Wu Chuan
VLM
153
1
0
13 Apr 2025
FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment
FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment
Sijing Wu
Yunhao Li
Ziwen Xu
Yixuan Gao
Huiyu Duan
Wei Sun
Guangtao Zhai
360
1
0
12 Apr 2025
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
Yichao Yuan
Lin Ma
Nishil Talati
MoE
107
0
0
12 Apr 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Wissam Antoun
B. Sagot
Djamé Seddah
MQ
70
1
0
11 Apr 2025
Particle Hit Clustering and Identification Using Point Set Transformers in Liquid Argon Time Projection Chambers
Particle Hit Clustering and Identification Using Point Set Transformers in Liquid Argon Time Projection Chambers
Edgar E. Robles
A. Yankelevich
Wenjie Wu
J. Bian
Pierre Baldi
63
0
0
11 Apr 2025
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
M. Dhouib
Davide Buscaldi
Sonia Vanier
A. Shabou
VLM
107
1
0
11 Apr 2025
Token Level Routing Inference System for Edge Devices
Token Level Routing Inference System for Edge Devices
Jianshu She
Wenhao Zheng
Zhengzhong Liu
Hongyi Wang
Eric P. Xing
Huaxiu Yao
Qirong Ho
76
1
0
10 Apr 2025
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
Shihong Gao
Wei Wei
Yanyan Shen
Lei Chen
70
1
0
10 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Z. Huang
Zhe Chen
Zijia Zhao
Ziwei Chen
Zongyu Lin
MLLMVLMMoE
393
32
0
10 Apr 2025
Distilling Textual Priors from LLM to Efficient Image Fusion
Distilling Textual Priors from LLM to Efficient Image Fusion
Ran Zhang
Xuanhua He
Ke Cao
Liu Liu
Li Zhang
Man Zhou
Jie Zhang
90
0
0
09 Apr 2025
CHIME: A Compressive Framework for Holistic Interest Modeling
CHIME: A Compressive Framework for Holistic Interest Modeling
Yong Bai
Rui Xiang
Kaiyuan Li
Yongxiang Tang
Yanhua Cheng
Xialong Liu
Peng Jiang
Kun Gai
64
1
0
09 Apr 2025
Crafting Query-Aware Selective Attention for Single Image Super-Resolution
Crafting Query-Aware Selective Attention for Single Image Super-Resolution
Junyoung Kim
Youngrok Kim
Siyeol Jung
Donghyun Min
89
0
0
09 Apr 2025
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
Yoshihiro Yamada
ViT
79
0
0
09 Apr 2025
High-Resource Translation:Turning Abundance into Accessibility
High-Resource Translation:Turning Abundance into Accessibility
Abhiram Reddy Yanampally
46
0
0
08 Apr 2025
SPIRe: Boosting LLM Inference Throughput with Speculative Decoding
SPIRe: Boosting LLM Inference Throughput with Speculative Decoding
Sanjit Neelam
Daniel Heinlein
Vaclav Cvicek
Akshay Mishra
Reiner Pope
LRM
77
0
0
08 Apr 2025
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
Igor Polyakov
Alexey Dukhanov
Egor Spirin
81
0
0
08 Apr 2025
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Xinpeng Ding
Kai Zhang
Jinahua Han
Lanqing Hong
Hang Xu
Xuelong Li
MLLMVLM
498
0
0
08 Apr 2025
Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching
Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching
Yanhao Dong
Yubo Miao
Weinan Li
Xiao Zheng
Chao Wang
Feng Lyu
59
0
0
08 Apr 2025
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
Yanbiao Liang
Huihong Shi
Haikuo Shao
Zhongfeng Wang
94
0
0
07 Apr 2025
One-Minute Video Generation with Test-Time Training
One-Minute Video Generation with Test-Time Training
Karan Dalal
Daniel Koceja
Gashon Hussein
Jiarui Xu
Yue Zhao
...
Tatsunori Hashimoto
Sanmi Koyejo
Yejin Choi
Yu Sun
Xiaolong Wang
ViT
181
13
0
07 Apr 2025
Window Token Concatenation for Efficient Visual Large Language Models
Window Token Concatenation for Efficient Visual Large Language Models
Yifan Li
Wentao Bao
Botao Ye
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
VLM
97
0
0
05 Apr 2025
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
79
0
0
05 Apr 2025
Reasoning on Multiple Needles In A Haystack
Reasoning on Multiple Needles In A Haystack
Yidong Wang
LRM
52
0
0
05 Apr 2025
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Simon A. Lee
Anthony Wu
Jeffrey N. Chiang
MedIm
95
6
0
04 Apr 2025
Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable
Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable
Xin Jin
Simon Niklaus
Zhoutong Zhang
Zhihao Xia
Chunle Guo
Yuting Yang
J. Chen
Chongyi Li
VGen
135
1
0
04 Apr 2025
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
Yongji Wu
Xueshen Liu
Shuowei Jin
Ceyu Xu
Feng Qian
Ziming Mao
Matthew Lentz
Danyang Zhuo
Ion Stoica
MoMeMoE
100
0
0
04 Apr 2025
FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention
FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention
Huangliang Dai
Shixun Wu
Hairui Zhao
Jiajun Huang
Zizhe Jian
Yue Zhu
Haiyang Hu
Zizhong Chen
66
0
0
03 Apr 2025
A Framework for Robust Cognitive Evaluation of LLMs
A Framework for Robust Cognitive Evaluation of LLMs
Karin de Langis
J. Park
Bin Hu
Khanh Chi Le
Andreas Schramm
Michael C. Mensink
Andrew Elfenbein
Dongyeop Kang
86
0
0
03 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLMMLLMLRM
279
0
0
03 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
173
0
0
02 Apr 2025
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding
Sakhinana Sagar Srinivas
Akash Das
Shivam Gupta
Venkataramana Runkana
OffRL
120
1
0
02 Apr 2025
Urban Computing in the Era of Large Language Models
Urban Computing in the Era of Large Language Models
Zhonghang Li
Lianghao Xia
Xubin Ren
J. Tang
Tianyi Chen
Yong-mei Xu
Chenyu Huang
218
0
0
02 Apr 2025
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
Runlong Zhou
Yi Zhang
RALM
88
0
0
02 Apr 2025
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
Nicola Muca Cirone
C. Salvi
117
2
0
01 Apr 2025
TransMamba: Flexibly Switching between Transformer and Mamba
TransMamba: Flexibly Switching between Transformer and Mamba
Yixing Li
Ruobing Xie
Zhen Yang
Xingwu Sun
Shuaipeng Li
...
Zhanhui Kang
Yu Cheng
C. Xu
Di Wang
Jie Jiang
Mamba
152
2
0
31 Mar 2025
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Wei Tao
Bin Zhang
Xiaoyang Qu
Jiguang Wan
Jianzong Wang
128
2
0
30 Mar 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
Jiadong Wang
Tao Dai
Shu-Tao Xia
Luca Benini
157
5
0
30 Mar 2025
Large Language and Reasoning Models are Shallow Disjunctive Reasoners
Large Language and Reasoning Models are Shallow Disjunctive Reasoners
Irtaza Khalid
Amir Masoud Nourollah
Steven Schockaert
LRM
180
1
0
30 Mar 2025
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Xinyu Wang
Linrui Ma
Jerry Huang
Peng Lu
Prasanna Parthasarathi
Xiao-Wen Chang
Boxing Chen
Yufei Cui
KELM
130
1
0
28 Mar 2025
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian Sun
Wei Ma
212
4
0
27 Mar 2025
A Multi-Modal Knowledge-Enhanced Framework for Vessel Trajectory Prediction
A Multi-Modal Knowledge-Enhanced Framework for Vessel Trajectory Prediction
Haomin Yu
Tianyi Li
Kristian Torp
Christian S. Jensen
55
1
0
27 Mar 2025
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
Dongchen Lu
Yuyao Sun
Zilu Zhang
Leping Huang
Jianliang Zeng
Mao Shu
Huo Cao
140
4
0
27 Mar 2025
Inductive Link Prediction on N-ary Relational Facts via Semantic Hypergraph Reasoning
Inductive Link Prediction on N-ary Relational Facts via Semantic Hypergraph Reasoning
Gongzhu Yin
Hao Zhang
Yuchen Yang
Yihao Luo
LRM
112
1
0
26 Mar 2025
Named Entity Recognition in Context
Named Entity Recognition in Context
Colin Brisson
Ayoub Kahfy
Marc Bui
Frédéric Constant
138
0
0
26 Mar 2025
UniEDU: A Unified Language and Vision Assistant for Education Applications
UniEDU: A Unified Language and Vision Assistant for Education Applications
Zhendong Chu
Jian Xie
Shen Wang
Ziyi Wang
Qingsong Wen
AI4Ed
150
0
0
26 Mar 2025
Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder
Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder
Changye Li
Weizhe Xu
Serguei V. S. Pakhomov
Ellen Bradley
Dror Ben-Zeev
T. Cohen
109
0
0
25 Mar 2025
GIViC: Generative Implicit Video Compression
GIViC: Generative Implicit Video Compression
Ge Gao
Siyue Teng
Tianhao Peng
Fan Zhang
David Bull
DiffMVGen
79
0
0
25 Mar 2025
Previous
123456...293031
Next