ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,438 papers shown
Title
Optimizing Distributed Training on Frontier for Large Language Models
Optimizing Distributed Training on Frontier for Large Language Models
Sajal Dash
Isaac Lyngaas
Junqi Yin
Xiao Wang
Romain Egele
Guojing Cong
Feiyi Wang
Prasanna Balaprakash
ALM
MoE
91
13
0
20 Dec 2023
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on
  NVIDIA Hopper Architecture using the CUTLASS Library
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library
Ganesh Bikshandi
Jay Shah
14
7
0
19 Dec 2023
Text-Conditioned Resampler For Long Form Video Understanding
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar
Yongqin Xian
A. Tonioni
Andrew Zisserman
Federico Tombari
38
12
0
19 Dec 2023
Efficient LLM inference solution on Intel GPU
Efficient LLM inference solution on Intel GPU
Hui Wu
Yi Gan
Feng Yuan
Jing Ma
Wei Zhu
...
Hong Zhu
Yuhua Zhu
Xiaoli Liu
Jinghui Gu
Peng Zhao
32
3
0
19 Dec 2023
A Heterogeneous Chiplet Architecture for Accelerating End-to-End
  Transformer Models
A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models
Harsh Sharma
Pratyush Dhingra
J. Doppa
Ümit Y. Ogras
P. Pande
34
7
0
18 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
38
1
0
18 Dec 2023
Linear Attention via Orthogonal Memory
Linear Attention via Orthogonal Memory
Jun Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
40
3
0
18 Dec 2023
StarVector: Generating Scalable Vector Graphics Code from Images
StarVector: Generating Scalable Vector Graphics Code from Images
Juan A. Rodriguez
Shubham Agarwal
I. Laradji
Pau Rodríguez
David Vazquez
Christopher Pal
M. Pedersoli
51
6
0
17 Dec 2023
SPT: Fine-Tuning Transformer-based Language Models Efficiently with
  Sparsification
SPT: Fine-Tuning Transformer-based Language Models Efficiently with Sparsification
Yuntao Gui
Xiao Yan
Peiqi Yin
Han Yang
James Cheng
43
2
0
16 Dec 2023
Extending Context Window of Large Language Models via Semantic
  Compression
Extending Context Window of Large Language Models via Semantic Compression
Weizhi Fei
Xueyan Niu
Pingyi Zhou
Lu Hou
Bo Bai
Lei Deng
Wei Han
46
27
0
15 Dec 2023
Marathon: A Race Through the Realm of Long Context with Large Language
  Models
Marathon: A Race Through the Realm of Long Context with Large Language Models
Lei Zhang
Yunshui Li
Ziqiang Liu
Jiaxi Yang
Junhao Liu
Longze Chen
Run Luo
Min Yang
OffRL
LRM
45
6
0
15 Dec 2023
Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning
Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning
Avelina Asada Hadji-Kyriacou
Ognjen Arandjelović
35
0
0
14 Dec 2023
Motion Flow Matching for Human Motion Synthesis and Editing
Motion Flow Matching for Human Motion Synthesis and Editing
Vincent Tao Hu
Wenzhe Yin
Pingchuan Ma
Yunlu Chen
Basura Fernando
Yuki M. Asano
E. Gavves
Pascal Mettes
Bjorn Ommer
Cees G. M. Snoek
DiffM
37
19
0
14 Dec 2023
TigerBot: An Open Multilingual Multitask LLM
TigerBot: An Open Multilingual Multitask LLM
Ye Chen
Wei Cai
Liangming Wu
Xiaowei Li
Zhanxuan Xin
Cong Fu
135
11
0
14 Dec 2023
Zebra: Extending Context Window with Layerwise Grouped Local-Global
  Attention
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
36
7
0
14 Dec 2023
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
28
14
0
13 Dec 2023
On a Foundation Model for Operating Systems
On a Foundation Model for Operating Systems
Divyanshu Saxena
Nihal Sharma
Donghyun Kim
Rohit Dwivedula
Jiayi Chen
...
Alex Dimakis
P. B. Godfrey
Daehyeok Kim
Chris Rossbach
Gang Wang
47
2
0
13 Dec 2023
SGLang: Efficient Execution of Structured Language Model Programs
SGLang: Efficient Execution of Structured Language Model Programs
Lianmin Zheng
Liangsheng Yin
Zhiqiang Xie
Chuyue Sun
Jeff Huang
...
Christos Kozyrakis
Ion Stoica
Joseph E. Gonzalez
Clark W. Barrett
Ying Sheng
LRM
42
117
0
12 Dec 2023
DYAD: A Descriptive Yet Abjuring Density efficient approximation to
  linear neural network layers
DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers
S. Chandy
Varun Gangal
Yi Yang
Gabriel Maggiotti
35
0
0
11 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Aaron Courville
Bailin Wang
Songlin Yang
Yikang Shen
Yoon Kim
48
144
0
11 Dec 2023
DiT-Head: High-Resolution Talking Head Synthesis using Diffusion
  Transformers
DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers
Aaron Mir
Eduardo Alonso
Esther Mondragón
DiffM
45
2
0
11 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
27
38
0
11 Dec 2023
ASVD: Activation-aware Singular Value Decomposition for Compressing
  Large Language Models
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
Zhihang Yuan
Yuzhang Shang
Yue Song
Qiang Wu
Yan Yan
Guangyu Sun
MQ
40
44
0
10 Dec 2023
Batched Low-Rank Adaptation of Foundation Models
Batched Low-Rank Adaptation of Foundation Models
Yeming Wen
Swarat Chaudhuri
OffRL
29
19
0
09 Dec 2023
Stateful Large Language Model Serving with Pensieve
Stateful Large Language Model Serving with Pensieve
Lingfan Yu
Jinyang Li
RALM
KELM
LLMAG
44
12
0
09 Dec 2023
ESPN: Memory-Efficient Multi-Vector Information Retrieval
ESPN: Memory-Efficient Multi-Vector Information Retrieval
Susav Shrestha
Narasimha Reddy
Zongwang Li
34
6
0
09 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
41
30
0
08 Dec 2023
Trajeglish: Traffic Modeling as Next-Token Prediction
Trajeglish: Traffic Modeling as Next-Token Prediction
Jonah Philion
Xue Bin Peng
Sanja Fidler
23
21
0
07 Dec 2023
A Hardware Evaluation Framework for Large Language Model Inference
A Hardware Evaluation Framework for Large Language Model Inference
Hengrui Zhang
August Ning
R. Prabhakar
D. Wentzlaff
ELM
35
17
0
05 Dec 2023
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on
  Open-Source Large Language Models
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models
Xinyu Crystina Zhang
Sebastian Hofstatter
Patrick Lewis
Raphael Tang
Jimmy J. Lin
LRM
KELM
ELM
RALM
ALM
40
6
0
05 Dec 2023
Decoding Data Quality via Synthetic Corruptions: Embedding-guided
  Pruning of Code Data
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data
Yu Yang
Aaditya K. Singh
Mostafa Elhoushi
Anas Mahmoud
Kushal Tirumala
Fabian Gloeckle
Baptiste Rozière
Carole-Jean Wu
Ari S. Morcos
Newsha Ardalani
AAML
SyDa
41
10
0
05 Dec 2023
Efficient Online Data Mixing For Language Model Pre-Training
Efficient Online Data Mixing For Language Model Pre-Training
Alon Albalak
Liangming Pan
Colin Raffel
Wei Wang
32
34
0
05 Dec 2023
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context
  Learning
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Bill Yuchen Lin
Abhilasha Ravichander
Ximing Lu
Nouha Dziri
Melanie Sclar
Khyathi Raghavi Chandu
Chandra Bhagavatula
Yejin Choi
22
169
0
04 Dec 2023
Recurrent Distance Filtering for Graph Representation Learning
Recurrent Distance Filtering for Graph Representation Learning
Yuhui Ding
Antonio Orvieto
Bobby He
Thomas Hofmann
GNN
36
6
0
03 Dec 2023
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long
  Documents
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
James Enouen
Hootan Nakhost
Sayna Ebrahimi
Sercan Ö. Arik
Yan Liu
Tomas Pfister
33
5
0
03 Dec 2023
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Minchul Kim
Shangqian Gao
Yen-Chang Hsu
Yilin Shen
Hongxia Jin
31
32
0
02 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
29
22
0
01 Dec 2023
Nonparametric Variational Regularisation of Pretrained Transformers
Nonparametric Variational Regularisation of Pretrained Transformers
Fabio Fehr
James Henderson
43
0
0
01 Dec 2023
CoLLiE: Collaborative Training of Large Language Models in an Efficient
  Way
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
Kai Lv
Shuo Zhang
Tianle Gu
Shuhao Xing
Jiawei Hong
...
Tengxiao Liu
Yu Sun
Penousal Machado
Hang Yan
Xipeng Qiu
38
7
0
01 Dec 2023
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep
  Neural Networks
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep Neural Networks
Suman Sapkota
Binod Bhattarai
37
0
0
30 Nov 2023
Splitwise: Efficient generative LLM inference using phase splitting
Splitwise: Efficient generative LLM inference using phase splitting
Pratyush Patel
Esha Choukse
Chaojie Zhang
Aashaka Shah
Íñigo Goiri
Saeed Maleki
Ricardo Bianchini
58
203
0
30 Nov 2023
HOT: Higher-Order Dynamic Graph Representation Learning with Efficient
  Transformers
HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers
Maciej Besta
Afonso Claudino Catarino
Lukas Gianinazzi
Nils Blach
Piotr Nyczyk
H. Niewiadomski
Torsten Hoefler
35
6
0
30 Nov 2023
Perceptual Group Tokenizer: Building Perception with Iterative Grouping
Perceptual Group Tokenizer: Building Perception with Iterative Grouping
Zhiwei Deng
Ting Chen
Yang Li
ViT
VLM
29
2
0
30 Nov 2023
Diffusion Models Without Attention
Diffusion Models Without Attention
Jing Nathan Yan
Jiatao Gu
Alexander M. Rush
35
61
0
30 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000
  Frames
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Guohao Li
38
25
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
87
413
0
28 Nov 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
  Catching up?
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Shafiq Joty
ELM
CLL
AI4MH
LRM
ALM
85
27
0
28 Nov 2023
On the Long Range Abilities of Transformers
On the Long Range Abilities of Transformers
Itamar Zimerman
Lior Wolf
35
7
0
28 Nov 2023
Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight
  Matrix with Asynchronous Dequantization
Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization
Jinhao Li
Jiaming Xu
Shiyao Li
Shan Huang
Jun Liu
Yaoxiu Lian
Guohao Dai
MQ
31
3
0
28 Nov 2023
Swallowing the Bitter Pill: Simplified Scalable Conformer Generation
Swallowing the Bitter Pill: Simplified Scalable Conformer Generation
Yuyang Wang
Ahmed A. A. Elhag
Navdeep Jaitly
J. Susskind
Miguel Angel Bautista
DiffM
32
20
0
27 Nov 2023
Previous
123...212223...272829
Next