ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
A Comprehensive Performance Study of Large Language Models on Novel AI
  Accelerators
A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators
M. Emani
Sam Foreman
Varuni K. Sastry
Zhen Xie
Siddhisanket Raskar
William Arnold
R. Thakur
V. Vishwanath
M. Papka
ELM
73
10
0
06 Oct 2023
How to Capture Higher-order Correlations? Generalizing Matrix Softmax
  Attention to Kronecker Computation
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Josh Alman
Zhao Song
127
37
0
06 Oct 2023
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
  Reasoning
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Ke Wang
Houxing Ren
Aojun Zhou
Zimu Lu
Sichun Luo
Weikang Shi
Renrui Zhang
Linqi Song
Mingjie Zhan
Hongsheng Li
ReLMLRMSyDa
119
106
0
05 Oct 2023
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context
  LLMs Training
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Dacheng Li
Rulin Shao
Anze Xie
Eric P. Xing
Xuezhe Ma
Ion Stoica
Joseph E. Gonzalez
Hao Zhang
97
22
0
05 Oct 2023
Retrieval meets Long Context Large Language Models
Retrieval meets Long Context Large Language Models
Peng Xu
Ming-Yu Liu
Xianchao Wu
Lawrence C. McAfee
Chen Zhu
Zihan Liu
Sandeep Subramanian
Evelina Bakhturina
Mohammad Shoeybi
Bryan Catanzaro
RALMLRM
91
86
0
04 Oct 2023
Never Train from Scratch: Fair Comparison of Long-Sequence Models
  Requires Data-Driven Priors
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Ido Amos
Jonathan Berant
Ankit Gupta
110
29
0
04 Oct 2023
RoFormer for Position Aware Multiple Instance Learning in Whole Slide
  Image Classification
RoFormer for Position Aware Multiple Instance Learning in Whole Slide Image Classification
Etienne Pochet
Rami Maroun
Roger Trullo
MedIm
55
2
0
03 Oct 2023
Ring Attention with Blockwise Transformers for Near-Infinite Context
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu
Matei A. Zaharia
Pieter Abbeel
105
258
0
03 Oct 2023
SEA: Sparse Linear Attention with Estimated Attention Mask
SEA: Sparse Linear Attention with Estimated Attention Mask
Heejun Lee
Jina Kim
Jeffrey Willette
Sung Ju Hwang
162
7
0
03 Oct 2023
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
Praneeth Kacham
Vahab Mirrokni
Peilin Zhong
95
14
0
02 Oct 2023
CAT-LM: Training Language Models on Aligned Code And Tests
CAT-LM: Training Language Models on Aligned Code And Tests
Nikitha Rao
Kush Jain
Uri Alon
Claire Le Goues
Vincent J. Hellendoorn
ALM
83
47
0
02 Oct 2023
GRID: A Platform for General Robot Intelligence Development
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
74
10
0
02 Oct 2023
Learning Type Inference for Enhanced Dataflow Analysis
Learning Type Inference for Enhanced Dataflow Analysis
Lukas Seidel
Sedick Baker Effendi
Xavier Pinho
Konrad Rieck
Brink van der Merwe
Fabian Yamaguchi
60
2
0
01 Oct 2023
GrowLength: Accelerating LLMs Pretraining by Progressively Growing
  Training Length
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Chia-Yuan Chang
Helen Zhou
77
11
0
01 Oct 2023
Efficient Streaming Language Models with Attention Sinks
Efficient Streaming Language Models with Attention Sinks
Michel Lang
Yuandong Tian
Beidi Chen
Song Han
Mike Lewis
AI4TSRALM
162
791
0
29 Sep 2023
GAIA-1: A Generative World Model for Autonomous Driving
GAIA-1: A Generative World Model for Autonomous Driving
Masane Fuchi
Lloyd Russell
Hudson Yeo
Zak Murez
Hiroto Minami
Alex Kendall
Tomohiro Takagi
Gianluca Corrado
VGen
130
252
0
29 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
94
17
0
28 Sep 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
279
1,922
0
28 Sep 2023
AtomSurf : Surface Representation for Learning on Protein Structures
AtomSurf : Surface Representation for Learning on Protein Structures
Vincent Mallet
Souhaib Attaiki
M. Ovsjanikov
93
3
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
105
17
0
28 Sep 2023
Predicting performance difficulty from piano sheet music images
Predicting performance difficulty from piano sheet music images
Yingwei Ma
J. J. Valero-Mas
Yu Jiang
Changjian Wang
396
2
0
28 Sep 2023
Attention Sorting Combats Recency Bias In Long Context Language Models
Attention Sorting Combats Recency Bias In Long Context Language Models
A. Peysakhovich
Adam Lerer
LRMRALM
121
52
0
28 Sep 2023
Masked Autoencoders are Scalable Learners of Cellular Morphology
Masked Autoencoders are Scalable Learners of Cellular Morphology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton Earnshaw
82
15
0
27 Sep 2023
Effective Long-Context Scaling of Foundation Models
Effective Long-Context Scaling of Foundation Models
Wenhan Xiong
Jingyu Liu
Igor Molybog
Hejia Zhang
Prajjwal Bhargava
...
Dániel Baráth
Sergey Edunov
Mike Lewis
Sinong Wang
Hao Ma
134
230
0
27 Sep 2023
Joint Prediction and Denoising for Large-scale Multilingual
  Self-supervised Learning
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
William Chen
Jiatong Shi
Brian Yan
Dan Berrebbi
Wangyou Zhang
Yifan Peng
Xuankai Chang
Soumi Maiti
Shinji Watanabe
83
10
0
26 Sep 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
  Long Sequence Transformer Models
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
79
121
0
25 Sep 2023
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot
  Compression
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression
Ayush Kaushal
Tejas Vaidhya
Irina Rish
121
16
0
25 Sep 2023
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with
  Large Language Models
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
Kailai Yang
Tianlin Zhang
Zi-Zhou Kuang
Qianqian Xie
Jimin Huang
Sophia Ananiadou
AI4MH
91
58
0
24 Sep 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling
  Capacities of Large Language Models
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALMALM
152
39
0
23 Sep 2023
AntiBARTy Diffusion for Property Guided Antibody Design
AntiBARTy Diffusion for Property Guided Antibody Design
Jordan Venderley
DiffM
47
1
0
22 Sep 2023
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Yukang Chen
Shengju Qian
Haotian Tang
Xin Lai
Zhijian Liu
Song Han
Jiaya Jia
165
170
0
21 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
106
199
0
20 Sep 2023
The Languini Kitchen: Enabling Language Modelling Research at Different
  Scales of Compute
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
86
9
0
20 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
105
50
0
19 Sep 2023
FoleyGen: Visually-Guided Audio Generation
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
88
23
0
19 Sep 2023
PoSE: Efficient Context Window Extension of LLMs via Positional
  Skip-wise Training
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Dawei Zhu
Nan Yang
Liang Wang
Yifan Song
Wenhao Wu
Furu Wei
Sujian Li
159
89
0
19 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELMLRM
328
755
0
19 Sep 2023
Exploring the impact of low-rank adaptation on the performance,
  efficiency, and regularization of RLHF
Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Simeng Sun
Dhawal Gupta
Mohit Iyyer
89
20
0
16 Sep 2023
Enhance audio generation controllability through representation
  similarity regularization
Enhance audio generation controllability through representation similarity regularization
Yangyang Shi
Gaël Le Lan
Varun K. Nagaraja
Zhaoheng Ni
Xinhao Mei
Ernie Chang
Forrest N. Iandola
Yang Liu
Vikas Chandra
66
1
0
15 Sep 2023
Replacing softmax with ReLU in Vision Transformers
Replacing softmax with ReLU in Vision Transformers
Mitchell Wortsman
Jaehoon Lee
Justin Gilmer
Simon Kornblith
ViT
91
33
0
15 Sep 2023
CoCA: Fusing Position Embedding with Collinear Constrained Attention in
  Transformers for Long Context Window Extending
CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
Shiyi Zhu
Jingting Ye
Wei Jiang
Siqiao Xue
Qi Zhang
Yifan Wu
Jianguo Li
41
4
0
15 Sep 2023
Less is More for Long Document Summary Evaluation by LLMs
Less is More for Long Document Summary Evaluation by LLMs
Yunshu Wu
Hayate Iso
Pouya Pezeshkpour
Nikita Bhutani
Estevam R. Hruschka
102
37
0
14 Sep 2023
Improved particle-flow event reconstruction with scalable neural
  networks for current and future particle detectors
Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors
J. Pata
Eric Wulff
Farouk Mokhtar
D. Southwick
Mengke Zhang
M. Girone
Javier Duarte
81
1
0
13 Sep 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
206
2,338
0
12 Sep 2023
CaloClouds II: Ultra-Fast Geometry-Independent Highly-Granular
  Calorimeter Simulation
CaloClouds II: Ultra-Fast Geometry-Independent Highly-Granular Calorimeter Simulation
E. Buhmann
F. Gaede
Gregor Kasieczka
A. Korol
W. Korcari
K. Krüger
Peter McKeown
DiffM
83
26
0
11 Sep 2023
Textbooks Are All You Need II: phi-1.5 technical report
Textbooks Are All You Need II: phi-1.5 technical report
Yuan-Fang Li
Sébastien Bubeck
Ronen Eldan
Allison Del Giorno
Suriya Gunasekar
Yin Tat Lee
ALMLRM
171
482
0
11 Sep 2023
Evaluating the Deductive Competence of Large Language Models
Evaluating the Deductive Competence of Large Language Models
S. M. Seals
V. Shalin
ELMReLMLRM
80
10
0
11 Sep 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language
  Models
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li
Qingyuan Li
Bo Zhang
Xiangxiang Chu
MQ
105
34
0
06 Sep 2023
Music Source Separation with Band-Split RoPE Transformer
Music Source Separation with Band-Split RoPE Transformer
Wei-Tsung Lu
Ju-Chiang Wang
Qiuqiang Kong
Yun-Ning Hung
102
25
0
05 Sep 2023
Publicly Shareable Clinical Large Language Model Built on Synthetic
  Clinical Notes
Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes
Sunjun Kweon
Junu Kim
Jiyoun Kim
Sujeong Im
Eunbyeol Cho
...
Seungjin Baek
Chang Hoon Han
Yoon Bin Jung
Yohan Jo
Edward Choi
LM&MAELM
92
41
0
01 Sep 2023
Previous
123...262728293031
Next