ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,510 papers shown
Title
LFMamba: Light Field Image Super-Resolution with State Space Model
LFMamba: Light Field Image Super-Resolution with State Space Model
Wang xia
Yao Lu
Shunzhou Wang
Ziqi Wang
Peiqi Xia
Tianfei Zhou
Mamba
113
4
0
18 Jun 2024
Attention Score is not All You Need for Token Importance Indicator in KV
  Cache Reduction: Value Also Matters
Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters
Zhiyu Guo
Hidetaka Kamigaito
Taro Watanabe
94
27
0
18 Jun 2024
TroL: Traversal of Layers for Large Language and Vision Models
TroL: Traversal of Layers for Large Language and Vision Models
Byung-Kwan Lee
Sangyun Chung
Chae Won Kim
Beomchan Park
Yong Man Ro
111
7
0
18 Jun 2024
A Scalable and Effective Alternative to Graph Transformers
A Scalable and Effective Alternative to Graph Transformers
Kaan Sancak
Zhigang Hua
Jin Fang
Yan Xie
Andrey Malevich
Bo Long
M. F. Balin
Ümit V. Çatalyürek
90
1
0
17 Jun 2024
Promises, Outlooks and Challenges of Diffusion Language Modeling
Promises, Outlooks and Challenges of Diffusion Language Modeling
Justin Deschenaux
Çağlar Gülçehre
DiffM
84
3
0
17 Jun 2024
SampleAttention: Near-Lossless Acceleration of Long Context LLM
  Inference with Adaptive Structured Sparse Attention
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu
Jiangfei Duan
Chang Chen
Siran Liu
Xiuhong Li
...
Huanqi Cao
Xiao Chuanfu
Xingcheng Zhang
Dahua Lin
Chao Yang
110
17
0
17 Jun 2024
HARE: HumAn pRiors, a key to small language model Efficiency
HARE: HumAn pRiors, a key to small language model Efficiency
Lingyun Zhang
Bin jin
Gaojian Ge
Lunhui Liu
Xuewen Shen
Mingyong Wu
Houqian Zhang
Yongneng Jiang
Shiqi Chen
Shi Pu
ALM
68
0
0
17 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
Korbinian Riedhammer
Tobias Bocklet
104
3
0
16 Jun 2024
Eliminating Biased Length Reliance of Direct Preference Optimization via
  Down-Sampled KL Divergence
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Junru Lu
Jiazheng Li
Siyu An
Meng Zhao
Yulan He
Di Yin
Xing Sun
94
20
0
16 Jun 2024
Breaking the Attention Bottleneck
Breaking the Attention Bottleneck
Kalle Hilsenbek
139
0
0
16 Jun 2024
New Solutions on LLM Acceleration, Optimization, and Application
New Solutions on LLM Acceleration, Optimization, and Application
Yingbing Huang
Lily Jiaxin Wan
Hanchen Ye
Manvi Jha
Jinghua Wang
Yuhong Li
Xiaofan Zhang
Deming Chen
89
12
0
16 Jun 2024
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Jiaming Tang
Yilong Zhao
Kan Zhu
Guangxuan Xiao
Baris Kasikci
Song Han
137
106
0
16 Jun 2024
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Yuan Pu
Yazhe Niu
Jiyuan Ren
Zhenjie Yang
Hongsheng Li
Yu Liu
OffRL
227
2
0
15 Jun 2024
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models
Yuchen Ren
Zhiyuan Chen
Lifeng Qiao
Hongtai Jing
Yuchen Cai
...
Siqi Sun
Hongliang Yan
Dong Yuan
Wanli Ouyang
Xihui Liu
83
10
0
14 Jun 2024
Diffusion Synthesizer for Efficient Multilingual Speech to Speech
  Translation
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
Nameer Hirschkind
Xiao Yu
Mahesh Kumar Nandwana
Joseph Liu
Eloi DuBois
...
Colin Sinclair
Kyle Spence
Charles Shang
Zoë Abrams
Morgan McGuire
66
0
0
14 Jun 2024
Towards Scalable and Versatile Weight Space Learning
Towards Scalable and Versatile Weight Space Learning
Konstantin Schurholt
Michael W. Mahoney
Damian Borth
107
19
0
14 Jun 2024
GEB-1.3B: Open Lightweight Large Language Model
GEB-1.3B: Open Lightweight Large Language Model
Jie Wu
Yufeng Zhu
Lei Shen
Xuqing Lu
ALM
44
0
0
14 Jun 2024
Optimal Kernel Orchestration for Tensor Programs with Korch
Optimal Kernel Orchestration for Tensor Programs with Korch
Muyan Hu
Ashwin Venkatram
Shreyashri Biswas
Balamurugan Marimuthu
Bohan Hou
Gabriele Oliaro
Haojie Wang
Liyan Zheng
Xupeng Miao
Jidong Zhai
396
6
0
13 Jun 2024
Multimodal Table Understanding
Multimodal Table Understanding
Mingyu Zheng
Xinwei Feng
Q. Si
Qiaoqiao She
Zheng Lin
Wenbin Jiang
Weiping Wang
LMTDVLM
145
20
0
12 Jun 2024
Sustainable self-supervised learning for speech representations
Sustainable self-supervised learning for speech representations
Luis Lugo
Valentin Vielzeuf
94
2
0
11 Jun 2024
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models
Jingyao Li
Han Shi
Xin Jiang
Zhenguo Li
Hong Xu
Jiaya Jia
LRM
61
2
0
11 Jun 2024
Markov Constraint as Large Language Model Surrogate
Markov Constraint as Large Language Model Surrogate
Alexandre Bonlarron
Jean-Charles Régin
56
2
0
11 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More
  Effective and Efficient Linearized Large Language Models
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
133
4
0
11 Jun 2024
Needle In A Multimodal Haystack
Needle In A Multimodal Haystack
Weiyun Wang
Shuibo Zhang
Yiming Ren
Yuchen Duan
Tiantong Li
...
Ping Luo
Yu Qiao
Jifeng Dai
Wenqi Shao
Wenhai Wang
VLM
116
24
0
11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
182
69
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
134
301
0
10 Jun 2024
Symmetric Dot-Product Attention for Efficient Training of BERT Language
  Models
Symmetric Dot-Product Attention for Efficient Training of BERT Language Models
Martin Courtois
Malte Ostendorff
Leonhard Hennig
Georg Rehm
91
2
0
10 Jun 2024
DualAD: Disentangling the Dynamic and Static World for End-to-End
  Driving
DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
Simon Doll
Niklas Hanselmann
Lukas Schneider
Richard Schulz
Marius Cordts
Markus Enzweiler
Hendrik P. A. Lensch
75
8
0
10 Jun 2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training
  Multiplication-Less Reparameterization
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You
Yipin Guo
Yichao Fu
Wei Zhou
Huihong Shi
Xiaofan Zhang
Souvik Kundu
Amir Yazdanbakhsh
Y. Lin
KELM
119
11
0
10 Jun 2024
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context
  Large Language Models
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models
Hengyu Zhang
RALM
100
2
0
09 Jun 2024
MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative
  Pre-Training
MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training
Bo Chen
Zhilei Bei
Xingyi Cheng
Pan Li
Jie Tang
Le Song
154
4
0
08 Jun 2024
Beyond Efficiency: Scaling AI Sustainably
Beyond Efficiency: Scaling AI Sustainably
Carole-Jean Wu
Bilge Acun
Ramya Raghavendra
Kim Hazelwood
GNN
100
19
0
08 Jun 2024
Enabling Efficient Batch Serving for LMaaS via Generation Length
  Prediction
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
Ke Cheng
Wen Hu
Zhi Wang
Peng Du
Jianguo Li
Sheng Zhang
126
11
0
07 Jun 2024
LLM-based speaker diarization correction: A generalizable approach
LLM-based speaker diarization correction: A generalizable approach
Georgios Efstathiadis
Vijay Yadav
Anzar Abbas
120
4
0
07 Jun 2024
Proofread: Fixes All Errors with One Tap
Proofread: Fixes All Errors with One Tap
Renjie Liu
Yanxiang Zhang
Yun Zhu
Haicheng Sun
Yuanbo Zhang
Michael Xuelin Huang
Shanqing Cai
Lei Meng
Shumin Zhai
ALM
68
3
0
06 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech
  Synthesis
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle
Nicolas Obin
Axel Roebel
66
6
0
06 Jun 2024
Pointer-Guided Pre-Training: Infusing Large Language Models with
  Paragraph-Level Contextual Awareness
Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness
L. Hillebrand
Prabhupad Pradhan
Christian Bauckhage
R. Sifa
51
1
0
06 Jun 2024
BindGPT: A Scalable Framework for 3D Molecular Design via Language
  Modeling and Reinforcement Learning
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus
Maksim Kuznetsov
Roman Schutski
Rim Shayakhmetov
Daniil Polykovskiy
Sarath Chandar
Alex Zhavoronkov
DiffMAI4CE
90
6
0
06 Jun 2024
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Dujian Ding
Bicheng Xu
L. Lakshmanan
VLM
116
2
0
06 Jun 2024
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large
  Language Model Training
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Ao Sun
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
96
8
0
05 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language
  Space
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
119
1
0
05 Jun 2024
Training of Physical Neural Networks
Training of Physical Neural Networks
Ali Momeni
Babak Rahmani
B. Scellier
Logan G. Wright
Peter L. McMahon
...
Julie Grollier
Andrea J. Liu
D. Psaltis
Andrea Alù
Romain Fleury
PINNAI4CE
123
17
0
05 Jun 2024
Llumnix: Dynamic Scheduling for Large Language Model Serving
Llumnix: Dynamic Scheduling for Large Language Model Serving
Biao Sun
Ziming Huang
Hanyu Zhao
Wencong Xiao
Xinyi Zhang
Yong Li
Wei Lin
93
57
0
05 Jun 2024
Balancing Performance and Efficiency in Zero-shot Robotic Navigation
Balancing Performance and Efficiency in Zero-shot Robotic Navigation
Dmytro Kuzmenko
N. Shvai
LM&Ro
76
0
0
05 Jun 2024
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal
  Learning
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Alex Jinpeng Wang
Linjie Li
Yiqi Lin
Min Li
Lijuan Wang
Mike Zheng Shou
VLM
96
5
0
04 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
Soheil Feizi
A. Bhatele
110
22
0
04 Jun 2024
Scalable MatMul-free Language Modeling
Scalable MatMul-free Language Modeling
Rui-Jie Zhu
Yu Zhang
Ethan Sifferman
Tyler Sheaves
Yiqiao Wang
Dustin Richmond
P. Zhou
Jason K. Eshraghian
94
22
0
04 Jun 2024
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Namgyu Ho
Sangmin Bae
Taehyeon Kim
Hyunjik Jo
Yireun Kim
Tal Schuster
Adam Fisch
James Thorne
Se-Young Yun
107
9
0
04 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
Jingshu Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Yanzhe Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
119
106
0
04 Jun 2024
Learning to Edit Visual Programs with Self-Supervision
Learning to Edit Visual Programs with Self-Supervision
R. K. Jones
Renhao Zhang
Aditya Ganeshan
Daniel E. Ritchie
SSL
86
3
0
04 Jun 2024
Previous
123...151617...293031
Next