ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,449 papers shown
Title
Inference Optimization of Foundation Models on AI Accelerators
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
Volkan Cevher
Yida Wang
George Karypis
62
3
0
12 Jul 2024
Flash normalization: fast normalization for LLMs
Flash normalization: fast normalization for LLMs
Nils Graef
Matthew Clapp
Andrew Wasielewski
23
0
0
12 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
79
120
0
11 Jul 2024
HDT: Hierarchical Document Transformer
HDT: Hierarchical Document Transformer
Haoyu He
Markus Flicke
Jan Buchmann
Iryna Gurevych
Andreas Geiger
48
0
0
11 Jul 2024
NinjaLLM: Fast, Scalable and Cost-effective RAG using Amazon SageMaker
  and AWS Trainium and Inferentia2
NinjaLLM: Fast, Scalable and Cost-effective RAG using Amazon SageMaker and AWS Trainium and Inferentia2
Tengfei Xue
Xuefeng Li
Roman Smirnov
Tahir Azim
Arash Sadrieh
Babak Pahlavan
20
0
0
11 Jul 2024
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
Jerry Huang
62
7
0
11 Jul 2024
Inference Performance Optimization for Large Language Models on CPUs
Inference Performance Optimization for Large Language Models on CPUs
Pujiang He
Shan Zhou
Wenhuan Huang
Changqing Li
Duyi Wang
Bin Guo
Chen Meng
Sheng Gui
Weifei Yu
Yi Xie
46
4
0
10 Jul 2024
B'MOJO: Hybrid State Space Realizations of Foundation Models with
  Eidetic and Fading Memory
B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory
Luca Zancato
Arjun Seshadri
Yonatan Dukler
Aditya Golatkar
Yantao Shen
Benjamin Bowman
Matthew Trager
Alessandro Achille
Stefano Soatto
49
8
0
08 Jul 2024
LLMBox: A Comprehensive Library for Large Language Models
LLMBox: A Comprehensive Library for Large Language Models
Tianyi Tang
Yiwen Hu
Bingqian Li
Wenyang Luo
Zijing Qin
...
Chunxuan Xia
Junyi Li
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
56
1
0
08 Jul 2024
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures
  Reveal Internal Characteristics of Meta's Llama 2 Model
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model
Brenden Smith
Dallin Baker
Clayton Chase
Myles Barney
Kaden Parker
Makenna Allred
Peter Hu
Alex Evans
Nancy Fulda
47
0
0
04 Jul 2024
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
Zhen Tan
Daize Dong
Xinyu Zhao
Jie Peng
Yu Cheng
Tianlong Chen
MoE
52
4
0
03 Jul 2024
Solving Motion Planning Tasks with a Scalable Generative Model
Solving Motion Planning Tasks with a Scalable Generative Model
Yihan Hu
Siqi Chai
Zhening Yang
Jingyu Qian
Kun Li
Wenxin Shao
Haichao Zhang
Wei Xu
Qiang Liu
53
18
0
03 Jul 2024
Let the Code LLM Edit Itself When You Edit the Code
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Zongzhang Zhang
Di He
KELM
50
1
0
03 Jul 2024
Deep Learning Based Apparent Diffusion Coefficient Map Generation from
  Multi-parametric MR Images for Patients with Diffuse Gliomas
Deep Learning Based Apparent Diffusion Coefficient Map Generation from Multi-parametric MR Images for Patients with Diffuse Gliomas
Zach Eidex
Mojtaba Safari
Jacob F. Wynne
Richard L. J. Qiu
Tonghe Wang
David Viar Hernandez
Hui-Kuo Shu
H. Mao
Xiaofeng Yang
DiffM
MedIm
43
1
0
02 Jul 2024
GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning
GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning
Hasna Chouikhi
Manel Aloui
Cyrine Ben Hammou
Ghaith Chaabane
Haithem Kchaou
Chehir Dhaouadi
49
0
0
02 Jul 2024
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and
  High-Performance Spatial Intelligence
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence
Francis Williams
Jiahui Huang
Jonathan Swartz
G. Klár
Vijay Thakkar
...
Ruilong Li
Clement Fuji-Tsang
Sanja Fidler
Eftychios Sifakis
Ken Museth
27
11
0
01 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive
  Benchmark of Long Context Capable Approaches
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
Vipin Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
53
18
0
01 Jul 2024
Pruning One More Token is Enough: Leveraging Latency-Workload
  Non-Linearities for Vision Transformers on the Edge
Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge
Nick Eliopoulos
Purvish Jajal
James Davis
Gaowen Liu
George K. Thiravathukal
Yung-Hsiang Lu
56
1
0
01 Jul 2024
Tree Search for Language Model Agents
Tree Search for Language Model Agents
Jing Yu Koh
Stephen Marcus McAleer
Daniel Fried
Ruslan Salakhutdinov
LM&Ro
LLMAG
LRM
64
62
0
01 Jul 2024
$\text{Memory}^3$: Language Modeling with Explicit Memory
Memory3\text{Memory}^3Memory3: Language Modeling with Explicit Memory
Hongkang Yang
Zehao Lin
Wenjin Wang
Hao Wu
Zhiyu Li
...
Yu Yu
Kai Chen
Feiyu Xiong
Linpeng Tang
Weinan E
50
12
0
01 Jul 2024
SE(3)-Hyena Operator for Scalable Equivariant Learning
SE(3)-Hyena Operator for Scalable Equivariant Learning
Artem Moskalev
Mangal Prakash
Rui Liao
Tommaso Mansi
65
2
0
01 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
49
8
0
30 Jun 2024
WallFacer: Guiding Transformer Model Training Out of the Long-Context
  Dark Forest with N-body Problem
WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem
Ziming Liu
Shaoyu Wang
Shenggan Cheng
Zhongkai Zhao
Xuanlei Zhao
James Demmel
Yang You
40
0
0
30 Jun 2024
Teola: Towards End-to-End Optimization of LLM-based Applications
Teola: Towards End-to-End Optimization of LLM-based Applications
Xin Tan
Yimin Jiang
Yitao Yang
Hong-Yu Xu
78
5
0
29 Jun 2024
InfiniGen: Efficient Generative Inference of Large Language Models with
  Dynamic KV Cache Management
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
Wonbeom Lee
Jungi Lee
Junghwan Seo
Jaewoong Sim
RALM
34
77
0
28 Jun 2024
Enhancing Continual Learning in Visual Question Answering with
  Modality-Aware Feature Distillation
Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
Malvina Nikandrou
Georgios Pantazopoulos
Ioannis Konstas
Alessandro Suglia
37
0
0
27 Jun 2024
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context
  Parallelism
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism
Diandian Gu
Peng Sun
Qinghao Hu
Ting Huang
Xun Chen
...
Jiarui Fang
Yonggang Wen
Tianwei Zhang
Xin Jin
Xuanzhe Liu
LRM
53
7
0
26 Jun 2024
MemServe: Context Caching for Disaggregated LLM Serving with Elastic
  Memory Pool
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
Cunchen Hu
Heyang Huang
Junhao Hu
Jiang Xu
Xusheng Chen
...
Chenxi Wang
Sa Wang
Yungang Bao
Ninghui Sun
Yizhou Shan
LLMAG
72
25
0
25 Jun 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large
  Language Models
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
44
52
0
24 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
35
19
0
24 Jun 2024
Venturing into Uncharted Waters: The Navigation Compass from Transformer
  to Mamba
Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba
Yuchen Zou
Yineng Chen
Zuchao Li
Lefei Zhang
Hai Zhao
70
1
0
24 Jun 2024
Building on Efficient Foundations: Effectively Training LLMs with
  Structured Feedforward Layers
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
Xiuying Wei
Skander Moalla
Razvan Pascanu
Çağlar Gülçehre
53
0
0
24 Jun 2024
Found in the Middle: Calibrating Positional Attention Bias Improves Long
  Context Utilization
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Cheng-Yu Hsieh
Yung-Sung Chuang
Chun-Liang Li
Zifeng Wang
Long T. Le
...
James R. Glass
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
57
33
0
23 Jun 2024
RuleR: Improving LLM Controllability by Rule-based Data Recycling
RuleR: Improving LLM Controllability by Rule-based Data Recycling
Ming Li
Han Chen
Chenguang Wang
Dang Nguyen
Dianqi Li
Dinesh Manocha
43
7
0
22 Jun 2024
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Ziyan Jiang
Xueguang Ma
Wenhu Chen
RALM
62
49
0
21 Jun 2024
SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement
  Learning
SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning
Matthias Weissenbacher
Rishabh Agarwal
Yoshinobu Kawahara
OffRL
36
1
0
21 Jun 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model
  Compression
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
52
17
0
21 Jun 2024
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
Qi Liu
Bo Wang
Nan Wang
Jiaxin Mao
RALM
88
3
0
21 Jun 2024
RouteFinder: Towards Foundation Models for Vehicle Routing Problems
RouteFinder: Towards Foundation Models for Vehicle Routing Problems
Federico Berto
Chuanbo Hua
Nayeli Gast Zepeda
André Hottung
N. Wouda
Leon Lan
Kevin Tierney
J. Park
Jinkyoo Park
65
12
0
21 Jun 2024
LiveMind: Low-latency Large Language Models with Simultaneous Inference
LiveMind: Low-latency Large Language Models with Simultaneous Inference
Chuangtao Chen
Grace Li Zhang
Xunzhao Yin
Cheng Zhuo
Ulf Schlichtmann
Bing Li
LRM
60
3
0
20 Jun 2024
Dye4AI: Assuring Data Boundary on Generative AI Services
Dye4AI: Assuring Data Boundary on Generative AI Services
Shu Wang
Kun Sun
Yan Zhai
47
1
0
20 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
88
14
0
20 Jun 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
57
7
0
19 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
79
532
0
18 Jun 2024
LFMamba: Light Field Image Super-Resolution with State Space Model
LFMamba: Light Field Image Super-Resolution with State Space Model
Wang xia
Yao Lu
Shunzhou Wang
Ziqi Wang
Peiqi Xia
Tianfei Zhou
Mamba
76
4
0
18 Jun 2024
Attention Score is not All You Need for Token Importance Indicator in KV
  Cache Reduction: Value Also Matters
Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters
Zhiyu Guo
Hidetaka Kamigaito
Taro Watanabe
43
21
0
18 Jun 2024
TroL: Traversal of Layers for Large Language and Vision Models
TroL: Traversal of Layers for Large Language and Vision Models
Byung-Kwan Lee
Sangyun Chung
Chae Won Kim
Beomchan Park
Yong Man Ro
56
6
0
18 Jun 2024
A Scalable and Effective Alternative to Graph Transformers
A Scalable and Effective Alternative to Graph Transformers
Kaan Sancak
Zhigang Hua
Jin Fang
Yan Xie
Andrey Malevich
Bo Long
M. F. Balin
Ümit V. Çatalyürek
57
1
0
17 Jun 2024
Promises, Outlooks and Challenges of Diffusion Language Modeling
Promises, Outlooks and Challenges of Diffusion Language Modeling
Justin Deschenaux
Çağlar Gülçehre
DiffM
62
3
0
17 Jun 2024
SampleAttention: Near-Lossless Acceleration of Long Context LLM
  Inference with Adaptive Structured Sparse Attention
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu
Jiangfei Duan
Chang Chen
Siran Liu
Xiuhong Li
...
Huanqi Cao
Xiao Chuanfu
Xingcheng Zhang
Dahua Lin
Chao Yang
30
16
0
17 Jun 2024
Previous
123...121314...272829
Next