ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,432 papers shown
Title
Approximating CKY with Transformers
Approximating CKY with Transformers
Ghazal Khalighinejad
Ollie Liu
Sam Wiseman
52
2
0
03 May 2023
Key-Locked Rank One Editing for Text-to-Image Personalization
Key-Locked Rank One Editing for Text-to-Image Personalization
Yoad Tewel
Rinon Gal
Gal Chechik
Y. Atzmon
DiffM
143
168
0
02 May 2023
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
Shixun Wu
Yujia Zhai
Jinyang Liu
Jiajun Huang
Zizhe Jian
Bryan M. Wong
Zizhong Chen
29
13
0
01 May 2023
SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor
  3D Object Detection
SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
Yichen Xie
Chenfeng Xu
Marie-Julie Rakotosaona
Patrick Rim
F. Tombari
Kurt Keutzer
Masayoshi Tomizuka
Wei Zhan
3DPC
56
53
0
27 Apr 2023
A Cookbook of Self-Supervised Learning
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
50
274
0
24 Apr 2023
Transformer-Based Language Model Surprisal Predicts Human Reading Times
  Best with About Two Billion Training Tokens
Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens
Byung-Doh Oh
William Schuler
48
25
0
22 Apr 2023
Speed Is All You Need: On-Device Acceleration of Large Diffusion Models
  via GPU-Aware Optimizations
Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations
Yu-Hui Chen
Raman Sarokin
Juhyun Lee
Jiuqiang Tang
Chuo-Ling Chang
Andrei Kulik
Matthias Grundmann
VLM
37
38
0
21 Apr 2023
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner
Benedikt Alkin
Andreas Fürst
Elisabeth Rumetshofer
Lukas Miklautz
Sepp Hochreiter
29
18
0
20 Apr 2023
Long-term Forecasting with TiDE: Time-series Dense Encoder
Long-term Forecasting with TiDE: Time-series Dense Encoder
Abhimanyu Das
Weihao Kong
Andrew B. Leach
Shaan Mathur
Rajat Sen
Rose Yu
AI4TS
53
239
0
17 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
119
3,055
0
14 Apr 2023
Modeling Dense Multimodal Interactions Between Biological Pathways and
  Histology for Survival Prediction
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Guillaume Jaume
Anurag J. Vaidya
Richard J. Chen
Drew F. K. Williamson
Paul Pu Liang
Faisal Mahmood
41
43
0
13 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and
  Scaling
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
36
1,178
0
03 Apr 2023
RPTQ: Reorder-based Post-training Quantization for Large Language Models
RPTQ: Reorder-based Post-training Quantization for Large Language Models
Zhihang Yuan
Lin Niu
Jia-Wen Liu
Wenyu Liu
Xinggang Wang
Yuzhang Shang
Guangyu Sun
Qiang Wu
Jiaxiang Wu
Bingzhe Wu
MQ
35
79
0
03 Apr 2023
Token Merging for Fast Stable Diffusion
Token Merging for Fast Stable Diffusion
Daniel Bolya
Judy Hoffman
35
98
0
30 Mar 2023
An Over-parameterized Exponential Regression
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao Song
16
36
0
29 Mar 2023
Your Diffusion Model is Secretly a Zero-Shot Classifier
Your Diffusion Model is Secretly a Zero-Shot Classifier
Alexander C. Li
Mihir Prabhudesai
Shivam Duggal
Ellis L Brown
Deepak Pathak
DiffM
VLM
55
226
0
28 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
81
470
0
27 Mar 2023
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient
  Vision Transformers
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Cong Wei
Brendan Duke
R. Jiang
P. Aarabi
Graham W. Taylor
Florian Shkurti
ViT
46
14
0
24 Mar 2023
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D
  Object Detection
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
Shihao Wang
Yingfei Liu
Tiancai Wang
Ying Li
Xiangyu Zhang
3DPC
56
193
0
21 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
40
259
0
20 Mar 2023
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language
  Models
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Vithursan Thangarasa
Abhay Gupta
William Marshall
Tianda Li
Kevin Leong
D. DeCoste
Sean Lie
Shreyas Saxena
MoE
AI4CE
21
18
0
18 Mar 2023
Meet in the Middle: A New Pre-training Paradigm
Meet in the Middle: A New Pre-training Paradigm
A. Nguyen
Nikos Karampatziakis
Weizhu Chen
13
20
0
13 Mar 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
88
268
0
11 Mar 2023
The style transformer with common knowledge optimization for image-text
  retrieval
The style transformer with common knowledge optimization for image-text retrieval
Wenrui Li
Zhengyu Ma
Jinqiao Shi
Xiaopeng Fan
ViT
35
5
0
01 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
37
12,368
0
27 Feb 2023
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep
  Learning Serving
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Zhuohan Li
Lianmin Zheng
Yinmin Zhong
Vincent Liu
Ying Sheng
...
Yanping Huang
Zhifeng Chen
Hao Zhang
Joseph E. Gonzalez
Ion Stoica
MoE
21
68
0
22 Feb 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
28
285
0
21 Feb 2023
Slapo: A Schedule Language for Progressive Optimization of Large Deep
  Learning Model Training
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen
Cody Hao Yu
Shuai Zheng
Zhen Zhang
Zhiru Zhang
Yida Wang
33
6
0
16 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
16
52
0
13 Feb 2023
A Unified View of Long-Sequence Models towards Modeling Million-Scale
  Dependencies
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
Hongyu Hè
Marko Kabić
25
2
0
13 Feb 2023
In-Context Learning with Many Demonstration Examples
In-Context Learning with Many Demonstration Examples
Mukai Li
Shansan Gong
Jiangtao Feng
Yiheng Xu
Jinchao Zhang
Zhiyong Wu
Lingpeng Kong
40
31
0
09 Feb 2023
Efficient Attention via Control Variates
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
34
18
0
09 Feb 2023
Q-Diffusion: Quantizing Diffusion Models
Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li
Yijia Liu
Long Lian
Hua Yang
Zhen Dong
Daniel Kang
Shanghang Zhang
Kurt Keutzer
DiffM
MQ
41
154
0
08 Feb 2023
Regulating ChatGPT and other Large Generative AI Models
Regulating ChatGPT and other Large Generative AI Models
P. Hacker
A. Engel
M. Mauer
AILaw
32
328
0
05 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
Alternating Updates for Efficient Transformers
Alternating Updates for Efficient Transformers
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
MoE
48
5
0
30 Jan 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup,
  Composability, and Failure Cases
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Xiaoxia Wu
Cheng-rong Li
Reza Yazdani Aminabadi
Z. Yao
Yuxiong He
MQ
19
19
0
27 Jan 2023
AttMEMO : Accelerating Transformers with Memoization on Big Memory
  Systems
AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems
Yuan Feng
Hyeran Jeon
F. Blagojevic
Cyril Guyot
Qing Li
Dong Li
GNN
27
3
0
23 Jan 2023
FlatFormer: Flattened Window Attention for Efficient Point Cloud
  Transformer
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Zhijian Liu
Xinyu Yang
Haotian Tang
Shang Yang
Song Han
35
64
0
20 Jan 2023
Does compressing activations help model parallel training?
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
19
5
0
06 Jan 2023
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
Junjie Yan
Yingfei Liu
Jian‐Yuan Sun
Fan Jia
Shuailin Li
Tiancai Wang
Xiangyu Zhang
ViT
3DPC
28
55
0
03 Jan 2023
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement
  Understanding
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding
Steven H. Wang
Antoine Scardigli
Leonard Tang
Wei Chen
D.M. Levkin
Anya Chen
Spencer Ball
Thomas Woodside
Oliver Zhang
Dan Hendrycks
AILaw
ELM
35
16
0
02 Jan 2023
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
30
85
0
28 Dec 2022
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
73
370
0
28 Dec 2022
Pretraining Without Attention
Pretraining Without Attention
Junxiong Wang
J. Yan
Albert Gu
Alexander M. Rush
27
48
0
20 Dec 2022
FiDO: Fusion-in-Decoder optimized for stronger performance and faster
  inference
FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference
Michiel de Jong
Yury Zemlyanskiy
Joshua Ainslie
Nicholas FitzGerald
Sumit Sanghai
Fei Sha
William W. Cohen
VLM
23
32
0
15 Dec 2022
Elixir: Train a Large Language Model on a Small GPU Cluster
Elixir: Train a Large Language Model on a Small GPU Cluster
Haichen Huang
Jiarui Fang
Hongxin Liu
Shenggui Li
Yang You
VLM
24
7
0
10 Dec 2022
Simplifying and Understanding State Space Models with Diagonal Linear
  RNNs
Simplifying and Understanding State Space Models with Diagonal Linear RNNs
Ankit Gupta
Harsh Mehta
Jonathan Berant
29
21
0
01 Dec 2022
A Self-Attention Ansatz for Ab-initio Quantum Chemistry
A Self-Attention Ansatz for Ab-initio Quantum Chemistry
Ingrid von Glehn
J. Spencer
David Pfau
26
61
0
24 Nov 2022
Previous
123...272829
Next