ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,458 papers shown
Title
Multilingual Needle in a Haystack: Investigating Long-Context Behavior
  of Multilingual Large Language Models
Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models
Amey Hengle
Prasoon Bajpai
Soham Dan
Tanmoy Chakraborty
LRM
42
2
0
19 Aug 2024
OccMamba: Semantic Occupancy Prediction with State Space Models
OccMamba: Semantic Occupancy Prediction with State Space Models
Heng Li
Yuenan Hou
Xiaohan Xing
Xiao Sun
Xiao Sun
Yanyong Zhang
Mamba
68
5
0
19 Aug 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
68
25
0
19 Aug 2024
Reparameterized Multi-Resolution Convolutions for Long Sequence
  Modelling
Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling
Harry Jake Cunningham
Giorgio Giannone
Mingtian Zhang
M. Deisenroth
58
0
0
18 Aug 2024
HySem: A context length optimized LLM pipeline for unstructured tabular
  extraction
HySem: A context length optimized LLM pipeline for unstructured tabular extraction
Narayanan PP
A. P. N. Iyer
68
0
0
18 Aug 2024
Improving VTE Identification through Language Models from Radiology
  Reports: A Comparative Study of Mamba, Phi-3 Mini, and BERT
Improving VTE Identification through Language Models from Radiology Reports: A Comparative Study of Mamba, Phi-3 Mini, and BERT
Jamie Deng
Yusen Wu
Yelena Yesha
Phuong Nguyen
28
0
0
16 Aug 2024
Instruct Large Language Models to Generate Scientific Literature Survey
  Step by Step
Instruct Large Language Models to Generate Scientific Literature Survey Step by Step
Yuxuan Lai
Yupeng Wu
Yidan Wang
Wenpeng Hu
Chen Zheng
52
3
0
15 Aug 2024
Kraken: Inherently Parallel Transformers For Efficient Multi-Device
  Inference
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
R. Prabhakar
Hengrui Zhang
D. Wentzlaff
36
0
0
14 Aug 2024
Vision Language Model for Interpretable and Fine-grained Detection of
  Safety Compliance in Diverse Workplaces
Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces
Zhiling Chen
Hanning Chen
Mohsen Imani
Ruimin Chen
Farhad Imani
22
2
0
13 Aug 2024
FlatFusion: Delving into Details of Sparse Transformer-based
  Camera-LiDAR Fusion for Autonomous Driving
FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving
Yutao Zhu
Xiaosong Jia
Xinyu Yang
Junchi Yan
ViT
45
2
0
13 Aug 2024
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Tianyu Liu
Yun Li
Qitan Lv
Kai Liu
Jianchen Zhu
Winston Hu
Xingwu Sun
68
15
0
13 Aug 2024
Body Transformer: Leveraging Robot Embodiment for Policy Learning
Body Transformer: Leveraging Robot Embodiment for Policy Learning
Carmelo Sferrazza
Dun-Ming Huang
Fangchen Liu
Jongmin Lee
Pieter Abbeel
LM&Ro
55
13
0
12 Aug 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
104
438
0
12 Aug 2024
Post-Training Sparse Attention with Double Sparsity
Post-Training Sparse Attention with Double Sparsity
Shuo Yang
Ying Sheng
Joseph E. Gonzalez
Ion Stoica
Lianmin Zheng
46
8
0
11 Aug 2024
SAMSA: Efficient Transformer for Many Data Modalities
SAMSA: Efficient Transformer for Many Data Modalities
Minh Lenhat
Viet Anh Nguyen
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
79
0
0
10 Aug 2024
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
Jacob K Christopher
Brian Bartoldson
Tal Ben-Nun
Michael Cardei
B. Kailkhura
Ferdinando Fioretto
DiffM
68
3
0
10 Aug 2024
NACL: A General and Effective KV Cache Eviction Framework for LLMs at
  Inference Time
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
Yilong Chen
Guoxia Wang
Junyuan Shang
Shiyao Cui
Zhenyu Zhang
Tingwen Liu
Shuohuan Wang
Yu Sun
Dianhai Yu
Hua Wu
32
15
0
07 Aug 2024
PRISM: PRogressive dependency maxImization for Scale-invariant image
  Matching
PRISM: PRogressive dependency maxImization for Scale-invariant image Matching
Xudong Cai
Yongcai Wang
Lun Luo
Minhang Wang
Deying Li
Jintao Xu
Weihao Gu
Rui Ai
46
3
0
07 Aug 2024
AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging
AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging
Senkang Hu
Zhengru Fang
Zihan Fang
Yiqin Deng
Xianhao Chen
Yuguang Fang
Sam Kwong
71
15
0
07 Aug 2024
Inference Optimizations for Large Language Models: Effects, Challenges,
  and Practical Considerations
Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations
Leo Donisch
Sigurd Schacht
Carsten Lanquillon
55
2
0
06 Aug 2024
XMainframe: A Large Language Model for Mainframe Modernization
XMainframe: A Large Language Model for Mainframe Modernization
Anh T. V. Dau
Hieu Trung Dao
Anh Tuan Nguyen
Hieu Trung Tran
Phong X. Nguyen
Nghi D. Q. Bui
57
1
0
05 Aug 2024
Long Input Benchmark for Russian Analysis
Long Input Benchmark for Russian Analysis
I. Churin
Murat Apishev
Maria Tikhonova
Denis Shevelev
Aydar Bulatov
Yuri Kuratov
Sergej Averkiev
Alena Fenogenova
38
1
0
05 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
84
51
0
05 Aug 2024
Nested Music Transformer: Sequentially Decoding Compound Tokens in
  Symbolic Music and Audio Generation
Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation
Michael Kolle
Maximilian Zorn
Jongmin Jung
Dasaem Jeong
44
1
0
02 Aug 2024
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy
  Efficiency
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Jovan Stojkovic
Chaojie Zhang
Íñigo Goiri
Josep Torrellas
Esha Choukse
52
31
0
01 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
57
1
0
01 Aug 2024
Enhanced Structured State Space Models via Grouped FIR Filtering and
  Attention Sink Mechanisms
Enhanced Structured State Space Models via Grouped FIR Filtering and Attention Sink Mechanisms
Yueran Zhang
Yating Yu
Lingtong Min
Mamba
34
0
0
01 Aug 2024
Beat this! Accurate beat tracking without DBN postprocessing
Beat this! Accurate beat tracking without DBN postprocessing
Francesco Foscarin
Jan Schluter
Gerhard Widmer
54
5
0
31 Jul 2024
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented
  Generation via Knowledge-enhanced Reranking and Noise-injected Training
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training
Rivik Setty
Chengjin Xu
Vinay Setty
Jian Guo
51
12
0
31 Jul 2024
Palu: Compressing KV-Cache with Low-Rank Projection
Palu: Compressing KV-Cache with Low-Rank Projection
Chi-Chih Chang
Wei-Cheng Lin
Chien-Yu Lin
Chong-Yan Chen
Yu-Fang Hu
Pei-Shuo Wang
N. Huang
Luis Ceze
Kai-Chiang Wu
59
8
0
30 Jul 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
80
8
0
29 Jul 2024
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal
  Domain
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain
Pierre Colombo
T. Pires
Malik Boudiaf
Rui Melo
Dominic Culver
Sofia Morgado
Etienne Malaboeuf
Gabriel Hautreux
Johanne Charpentier
Michael Desa
ELM
AILaw
ALM
52
15
0
28 Jul 2024
Efficient LLM Training and Serving with Heterogeneous Context Sharding
  among Attention Heads
Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads
Xihui Lin
Yunan Zhang
Suyu Ge
Barun Patra
Vishrav Chaudhary
Hao Peng
Xia Song
43
0
0
25 Jul 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
63
10
0
24 Jul 2024
Dependency Transformer Grammars: Integrating Dependency Structures into
  Transformer Language Models
Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models
Yida Zhao
Chao Lou
Kewei Tu
90
0
0
24 Jul 2024
Scalify: scale propagation for efficient low-precision LLM training
Scalify: scale propagation for efficient low-precision LLM training
Paul Balança
Sam Hosegood
Carlo Luschi
Andrew Fitzgibbon
33
2
0
24 Jul 2024
ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing
  End-to-End Efficiency
ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency
Yuhang Yao
Han Jin
Alay Dilipbhai Shah
Shanshan Han
Zijian Hu
Yide Ran
Dimitris Stripelis
Zhaozhuo Xu
Salman Avestimehr
Chang D. Yoo
60
1
0
23 Jul 2024
Stress-Testing Long-Context Language Models with Lifelong ICL and Task
  Haystack
Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack
Xiaoyue Xu
Qinyuan Ye
Xiang Ren
69
7
0
23 Jul 2024
Evaluating Long Range Dependency Handling in Code Generation Models
  using Multi-Step Key Retrieval
Evaluating Long Range Dependency Handling in Code Generation Models using Multi-Step Key Retrieval
Yannick Assogba
Donghao Ren
59
1
0
23 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
47
2
0
22 Jul 2024
Mamba meets crack segmentation
Mamba meets crack segmentation
Zhili He
Yuhao Wang
Mamba
67
3
0
22 Jul 2024
ALLaM: Large Language Models for Arabic and English
ALLaM: Large Language Models for Arabic and English
M Saiful Bari
Yazeed Alnumay
Norah A. Alzahrani
Nouf M. Alotaibi
H. A. Alyahya
...
Jeril Kuriakose
Abdalghani Abujabal
Nora Al-Twairesh
Areeb Alowisheq
Haidar Khan
47
14
0
22 Jul 2024
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long
  Sequences Training
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Cheng Luo
Jiawei Zhao
Zhuoming Chen
Beidi Chen
A. Anandkumar
53
3
0
22 Jul 2024
Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open
  Dataset Challenge in Semantic Segmentation
Point Transformer V3 Extreme: 1st Place Solution for 2024 Waymo Open Dataset Challenge in Semantic Segmentation
Xiaoyang Wu
Xiang Xu
Lingdong Kong
Liang Pan
Ziwei Liu
Tong He
Wanli Ouyang
Hengshuang Zhao
58
0
0
21 Jul 2024
ReAttention: Training-Free Infinite Context with Finite Attention Scope
ReAttention: Training-Free Infinite Context with Finite Attention Scope
Xiaoran Liu
Ruixiao Li
Yuerong Song
Zhigeng Liu
Kai Lv
Hang Yan
Hang Yan
Linlin Li
Qun Liu
Xipeng Qiu
LLMAG
43
2
0
21 Jul 2024
Performance Modeling and Workload Analysis of Distributed Large Language
  Model Training and Inference
Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference
Joyjit Kundu
Wenzhe Guo
Ali BanaGozar
Udari De Alwis
Sourav Sengupta
Puneet Gupta
Arindam Mallik
47
5
0
19 Jul 2024
Stable Audio Open
Stable Audio Open
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
102
41
0
19 Jul 2024
Longhorn: State Space Models are Amortized Online Learners
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
60
11
0
19 Jul 2024
TorchGT: A Holistic System for Large-scale Graph Transformer Training
TorchGT: A Holistic System for Large-scale Graph Transformer Training
Mengdie Zhang
Jie Sun
Qi Hu
Peng Sun
Zeke Wang
Yonggang Wen
Tianwei Zhang
GNN
43
2
0
19 Jul 2024
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Qichen Fu
Minsik Cho
Thomas Merth
Sachin Mehta
Mohammad Rastegari
Mahyar Najibi
65
30
0
19 Jul 2024
Previous
123...111213...282930
Next