ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
FAST: Fast Audio Spectrogram Transformer
Anugunj Naman
Gaibo Zhang
63
0
0
03 Jan 2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Zihao Ye
Lequn Chen
Ruihang Lai
Wuwei Lin
Yineng Zhang
...
Tianqi Chen
Baris Kasikci
Vinod Grover
Arvind Krishnamurthy
Luis Ceze
133
35
0
02 Jan 2025
DiC: Rethinking Conv3x3 Designs in Diffusion Models
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian
Jing Han
Chengcheng Wang
Yuchen Liang
Chao Xu
Hanting Chen
DiffM
149
2
0
31 Dec 2024
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
314
725
0
31 Dec 2024
TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs
  via Bidirectional Communication
TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication
Zongwu Wang
Fangxin Liu
Mingshuai Li
Li Jiang
LRM
130
0
0
29 Dec 2024
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of
  Adaptive Draft Structures
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft Structures
Situo Zhang
Hankun Wang
Da Ma
Zichen Zhu
Lu Chen
Kunyao Lan
Kai Yu
105
6
0
25 Dec 2024
Tackling the Dynamicity in a Production LLM Serving System with SOTA
  Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient
  Meta-kernels
Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels
Mingcong Song
Xinru Tang
Fengfan Hou
Jing Li
Wei Wei
...
Hongjie Si
Dengyang Jiang
Shouyi Yin
Yang Hu
Guoping Long
71
3
0
24 Dec 2024
PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging
PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging
Mattias Paul Heinrich
3DPC
93
0
0
23 Dec 2024
ImagePiece: Content-aware Re-tokenization for Efficient Image
  Recognition
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition
Seungdong Yoa
Seungjun Lee
Hyeseung Cho
Bumsoo Kim
Woohyung Lim
ViT
98
0
0
21 Dec 2024
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
Han Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
147
6
0
21 Dec 2024
WebLLM: A High-Performance In-Browser LLM Inference Engine
WebLLM: A High-Performance In-Browser LLM Inference Engine
Charlie F. Ruan
Yucheng Qin
Xun Zhou
Ruihang Lai
Hongyi Jin
...
Yiyan Zhai
Sudeep Agarwal
Hangrui Cao
Siyuan Feng
Tianqi Chen
LRM
120
4
0
20 Dec 2024
Pipeline Analysis for Developing Instruct LLMs in Low-Resource
  Languages: A Case Study on Basque
Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on Basque
Ander Corral
Ixak Sarasua
Xabier Saralegi
72
2
0
18 Dec 2024
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with
  MxDNA
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Xinzhu Ma
Nanqing Dong
W. Ouyang
137
3
0
18 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
  Fast, Memory Efficient, and Long Context Finetuning and Inference
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
162
130
0
18 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
181
2
0
18 Dec 2024
FlexCache: Flexible Approximate Cache System for Video Diffusion
FlexCache: Flexible Approximate Cache System for Video Diffusion
Desen Sun
Henry Tian
Tim Lu
Sihang Liu
DiffM
160
1
0
18 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
197
7
0
17 Dec 2024
Echo: Simulating Distributed Training At Scale
Echo: Simulating Distributed Training At Scale
Yicheng Feng
Yuetao Chen
Kaiwen Chen
Jingzong Li
Tianyuan Wu
Peng Cheng
Chuan Wu
Wei Wang
Tsung-Yi Ho
Hong Xu
129
2
0
17 Dec 2024
ACE-$M^3$: Automatic Capability Evaluator for Multimodal Medical Models
ACE-M3M^3M3: Automatic Capability Evaluator for Multimodal Medical Models
Xiechi Zhang
Shunfan Zheng
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Liang He
ELM
149
0
0
16 Dec 2024
Attention with Dependency Parsing Augmentation for Fine-Grained
  Attribution
Attention with Dependency Parsing Augmentation for Fine-Grained Attribution
Qiang Ding
Lvzhou Luo
Yixuan Cao
Ping Luo
150
0
0
16 Dec 2024
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Zhao Jin
Dacheng Tao
VGen
252
1
0
16 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
143
0
0
13 Dec 2024
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang
Chih-Yao Ma
Yen-Cheng Liu
Ji Hou
Tao Xu
...
Peizhao Zhang
Tingbo Hou
Peter Vajda
N. Jha
Xiaoliang Dai
LMTDVGenVLMDiffM
197
11
0
13 Dec 2024
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
353
10
0
11 Dec 2024
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long
  Context Extension for Large Language Models
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
Haoran Lian
Junmin Chen
Wei Huang
Yizhe Xiong
Wenping Hu
...
Hui Chen
Jianwei Niu
Zijia Lin
Fuzheng Zhang
Di Zhang
122
0
0
10 Dec 2024
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
Baorui Ma
Huachen Gao
Haoge Deng
Zhengxiong Luo
Tiejun Huang
Lulu Tang
Xinlong Wang
DiffMVGen
266
16
0
09 Dec 2024
Flex Attention: A Programming Model for Generating Optimized Attention
  Kernels
Flex Attention: A Programming Model for Generating Optimized Attention Kernels
Juechu Dong
Boyuan Feng
Driss Guessous
Yanbo Liang
Horace He
140
30
0
07 Dec 2024
MIND: Effective Incorrect Assignment Detection through a Multi-Modal
  Structure-Enhanced Language Model
MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structure-Enhanced Language Model
Yunhe Pang
Bo Chen
Fanjin Zhang
Yanghui Rao
Jie Tang
131
0
0
05 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
286
7
0
04 Dec 2024
Does Few-Shot Learning Help LLM Performance in Code Synthesis?
Does Few-Shot Learning Help LLM Performance in Code Synthesis?
Derek Xu
Tong Xie
Botao Xia
Haoyu Li
Yunsheng Bai
Yizhou Sun
Wei Wang
192
1
0
03 Dec 2024
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Shuoyi Zhou
Yixuan Zhou
Weiqing Li
Jun Chen
Runchuan Ye
Weihao Wu
Zijian Lin
Shun Lei
Zhiyong Wu
169
1
0
02 Dec 2024
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu
Hanqing Wang
Ye Shi
Haoyang Luo
Sibei Yang
Jingyi Yu
Jingya Wang
LRMLM&Ro
204
3
0
02 Dec 2024
Token Cropr: Faster ViTs for Quite a Few Tasks
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner
C. Lippert
Aravindh Mahendran
ViTVLM
104
1
0
01 Dec 2024
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
Ali Shiraee Kasmaee
Mohammad Khodadad
Mohammad Arshi Saloot
Nick Sherck
Stephen Dokas
H. Mahyar
Soheila Samiee
ELM
636
2
0
30 Nov 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
120
16
0
29 Nov 2024
Structured Object Language Modeling (SoLM): Native Structured Objects
  Generation Conforming to Complex Schemas with Self-Supervised Denoising
Structured Object Language Modeling (SoLM): Native Structured Objects Generation Conforming to Complex Schemas with Self-Supervised Denoising
A. Tavanaei
Kee Kiat Koo
Hayreddin Ceker
Shaobai Jiang
Qi Li
Julien Han
Karim Bouyarmane
99
1
0
28 Nov 2024
Any-Resolution AI-Generated Image Detection by Spectral Learning
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou
Symeon Papadopoulos
I. Kompatsiaris
Efstratios Gavves
176
1
0
28 Nov 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
184
4
0
28 Nov 2024
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Feng Liu
Shiwei Zhang
Xiaofeng Wang
Yujie Wei
Haonan Qiu
Yuzhong Zhao
Yingya Zhang
Qixiang Ye
Fang Wan
VGenAI4TS
216
30
0
28 Nov 2024
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Akshat Sharma
Hangliang Ding
Jianping Li
Neel Dani
Minjia Zhang
161
1
0
27 Nov 2024
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen
Xinyin Ma
Gongfan Fang
Xinchao Wang
VLM
138
8
0
26 Nov 2024
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on
  GPUs
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs
Jiahui Liu
Zhenkun Cai
Zhiyong Chen
Minjie Wang
GNN
121
1
0
25 Nov 2024
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM
  Inference Environments
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments
Nikoleta Iliakopoulou
Jovan Stojkovic
Chloe Alverti
Tianyin Xu
Hubertus Franke
Josep Torrellas
123
2
0
24 Nov 2024
Hiding Communication Cost in Distributed LLM Training via Micro-batch
  Co-execution
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution
Haiquan Wang
Chaoyi Ruan
Jia He
Jiaqi Ruan
Chengjie Tang
Xiaosong Ma
Cheng-rong Li
159
1
0
24 Nov 2024
Communication-Efficient Sparsely-Activated Model Training via Sequence
  Migration and Token Condensation
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
Fahao Chen
Peng Li
Zicong Hong
Zhou Su
Song Guo
MoMeMoE
126
0
0
23 Nov 2024
FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient,
  Fast, and Efficient Transformer Acceleration
FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient, Fast, and Efficient Transformer Acceleration
Donghyeon Yi
Seoyoung Lee
Jongho Kim
Junyoung Kim
Sohmyung Ha
Ik Joon Chang
Minkyu Je
103
0
0
22 Nov 2024
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
Zhaopeng Tu
VLM
260
0
0
21 Nov 2024
MemoryFormer: Minimize Transformer Computation by Removing
  Fully-Connected Layers
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
Ning Ding
Yehui Tang
Haochen Qin
Zhenli Zhou
Chao Xu
Lin Li
Kai Han
Heng Liao
Yunhe Wang
92
2
0
20 Nov 2024
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
166
0
0
20 Nov 2024
Selective Attention: Enhancing Transformer through Principled Context
  Control
Selective Attention: Enhancing Transformer through Principled Context Control
Xuechen Zhang
Xiangyu Chang
Mingchen Li
Amit K. Roy-Chowdhury
Jiasi Chen
Samet Oymak
129
3
0
19 Nov 2024
Previous
123...8910...293031
Next