ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
The Case for Co-Designing Model Architectures with Hardware
The Case for Co-Designing Model Architectures with Hardware
Quentin G. Anthony
Jacob Hatef
Deepak Narayanan
Stella Biderman
Stas Bekman
Junqi Yin
Hari Subramoni
Hari Subramoni
Dhabaleswar Panda
3DV
45
6
0
25 Jan 2024
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language
  Models
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Feng-Huei Lin
Hanling Yi
Hongbin Li
Yifan Yang
Xiaotian Yu
Guangming Lu
Rong Xiao
86
4
0
23 Jan 2024
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
  Stereo
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
Chenjie Cao
Xinlin Ren
Yanwei Fu
92
29
0
22 Jan 2024
With Greater Text Comes Greater Necessity: Inference-Time Training Helps
  Long Text Generation
With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation
Y. Wang
D. Ma
D. Cai
RALM
99
20
0
21 Jan 2024
AttentionLego: An Open-Source Building Block For Spatially-Scalable
  Large Language Model Accelerator With Processing-In-Memory Technology
AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology
Rongqing Cong
Wenyang He
Mingxuan Li
Bangning Luo
Zebin Yang
Yuchao Yang
Ru Huang
Bonan Yan
27
3
0
21 Jan 2024
Inference without Interference: Disaggregate LLM Inference for Mixed
  Downstream Workloads
Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads
Cunchen Hu
Heyang Huang
Liangliang Xu
Xusheng Chen
Jiang Xu
...
Chenxi Wang
Sa Wang
Yungang Bao
Ninghui Sun
Yizhou Shan
DRL
103
77
0
20 Jan 2024
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence
  Inference
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Xuanlei Zhao
Shenggan Cheng
Guangyang Lu
Jiarui Fang
Hao Zhou
Bin Jia
Ziming Liu
Yang You
MQ
82
3
0
19 Jan 2024
PHOENIX: Open-Source Language Adaption for Direct Preference
  Optimization
PHOENIX: Open-Source Language Adaption for Direct Preference Optimization
Matthias Uhlig
Sigurd Schacht
Sudarshan Kamath Barkur
ALM
52
1
0
19 Jan 2024
Knowledge Fusion of Large Language Models
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
106
73
0
19 Jan 2024
Reconstructing the Invisible: Video Frame Restoration through Siamese
  Masked Conditional Variational Autoencoder
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder
Yongchen Zhou
Richard Jiang
44
0
0
18 Jan 2024
Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on
  Data-to-Text Generation
Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation
Zdeněk Kasner
Ondrej Dusek
107
11
0
18 Jan 2024
Towards Principled Graph Transformers
Towards Principled Graph Transformers
Luis Muller
Daniel Kusuma
Blai Bonet
Christopher Morris
81
4
0
18 Jan 2024
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized
  Large Language Model Serving
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Yinmin Zhong
Shengyu Liu
Junda Chen
Jianbo Hu
Yibo Zhu
Xuanzhe Liu
Xin Jin
Hao Zhang
97
206
0
18 Jan 2024
Computing in the Era of Large Generative Models: From Cloud-Native to
  AI-Native
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
Yao Lu
Song Bian
Lequn Chen
Yongjun He
Yulong Hui
...
Huanchen Zhang
Minjia Zhang
Qizhen Zhang
Tianyi Zhou
Danyang Zhuo
93
7
0
17 Jan 2024
InternEvo: Efficient Long-sequence Large Language Model Training via
  Hybrid Parallelism and Redundant Sharding
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
Qiaoling Chen
Diandian Gu
Guoteng Wang
Xun Chen
Yingtong Xiong
...
Qi Hu
Xin Jin
Yonggang Wen
Tianwei Zhang
Peng Sun
101
8
0
17 Jan 2024
GMLake: Efficient and Transparent GPU Memory Defragmentation for
  Large-scale DNN Training with Virtual Memory Stitching
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
Cong Guo
Rui Zhang
Jiale Xu
Jingwen Leng
Zihan Liu
...
Minyi Guo
Hao Wu
Shouren Zhao
Junping Zhao
Ke Zhang
VLM
125
12
0
16 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&RoLLMAG
181
41
0
16 Jan 2024
The What, Why, and How of Context Length Extension Techniques in Large
  Language Models -- A Detailed Survey
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Saurav Pawar
S.M. Towhidul Islam Tonmoy
S. M. M. Zaman
Vinija Jain
Aman Chadha
Amitava Das
68
29
0
15 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
160
130
0
15 Jan 2024
Flexibly Scaling Large Language Models Contexts Through Extensible
  Tokenization
Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization
Ninglu Shao
Shitao Xiao
Zheng Liu
Peitian Zhang
64
4
0
15 Jan 2024
Extending LLMs' Context Window with 100 Samples
Extending LLMs' Context Window with 100 Samples
Yikai Zhang
Junlong Li
Pengfei Liu
89
12
0
13 Jan 2024
DocFinQA: A Long-Context Financial Reasoning Dataset
DocFinQA: A Long-Context Financial Reasoning Dataset
Varshini Reddy
Rik Koncel-Kedziorski
Viet Dac Lai
Michael Krumdick
Charles Lovering
Chris Tanner
RALM
78
21
0
12 Jan 2024
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Mingdao Liu
Aohan Zeng
Bowen Wang
Peng Zhang
Jie Tang
Yuxiao Dong
127
10
0
12 Jan 2024
INTERS: Unlocking the Power of Large Language Models in Search with
  Instruction Tuning
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning
Yutao Zhu
Peitian Zhang
Chenghao Zhang
Yifei Chen
Binyu Xie
Zheng Liu
Ji-Rong Wen
Zhicheng Dou
62
17
0
12 Jan 2024
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
  the Language of Protein
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
Bo Chen
Xingyi Cheng
Pan Li
Yangli-ao Geng
Jing Gong
...
Chiming Liu
Aohan Zeng
Yuxiao Dong
Jie Tang
Leo T. Song
81
112
0
11 Jan 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator
  for Vision Applications
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong
Zhiqi Li
Yuntao Chen
Feng Wang
Xizhou Zhu
...
Hongsheng Li
Yu Qiao
Lewei Lu
Jie Zhou
Jifeng Dai
67
62
0
11 Jan 2024
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
  Lengths in Large Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
119
28
0
09 Jan 2024
Masked Audio Generation using a Single Non-Autoregressive Transformer
Masked Audio Generation using a Single Non-Autoregressive Transformer
Alon Ziv
Itai Gat
Gaël Le Lan
Tal Remez
Felix Kreuk
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
110
40
0
09 Jan 2024
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and
  DeepSpeed-Inference
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Connor Holmes
Masahiro Tanaka
Michael Wyatt
A. A. Awan
Jeff Rasley
...
Reza Yazdani Aminabadi
Heyang Qin
Arash Bakhtiari
Lev Kurilenko
Yuxiong He
87
71
0
09 Jan 2024
TeleChat Technical Report
TeleChat Technical Report
Zhongjiang He
Zihan Wang
Xinzhan Liu
Shixuan Liu
Yitong Yao
...
Zilu Huang
Sishi Xiong
Yuxiang Zhang
Chao Wang
Shuangyong Song
AI4MHLRMALM
87
4
0
08 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRMALM
206
381
0
05 Jan 2024
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese
  Prompt-Based Task
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
Gabriel Lino Garcia
P. H. Paiola
Luis Henrique Morelli
Giovani Candido
Arnaldo Cândido Júnior
D. Jodas
Luis C. S. Afonso
I. R. Guilherme
B. Penteado
João Paulo Papa
42
13
0
05 Jan 2024
Large Language Models in Plant Biology
Large Language Models in Plant Biology
H. Lam
Xing Er Ong
Marek Mutwil
41
23
0
05 Jan 2024
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
  and Distributed KVCache
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Bin Lin
Chen Zhang
Tao Peng
Hanyu Zhao
Wencong Xiao
...
Shen Li
Zhigang Ji
Tao Xie
Yong Li
Wei Lin
121
54
0
05 Jan 2024
AST-T5: Structure-Aware Pretraining for Code Generation and
  Understanding
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
Linyuan Gong
Mostafa Elhoushi
Alvin Cheung
145
18
0
05 Jan 2024
Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe
Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe
Mincong Huang
Chao Wang
Chi Ma
Yineng Zhang
Peng Zhang
Lei Yu
40
1
0
04 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xintao Hu
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
157
79
0
04 Jan 2024
Transformer Neural Autoregressive Flows
Transformer Neural Autoregressive Flows
Massimiliano Patacchiola
Aliaksandra Shysheya
Katja Hofmann
Richard Turner
TPM
44
3
0
03 Jan 2024
PLLaMa: An Open-source Large Language Model for Plant Science
PLLaMa: An Open-source Large Language Model for Plant Science
Xianjun Yang
Junfeng Gao
Wenxin Xue
Erik Alexandersson
84
19
0
03 Jan 2024
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse
  Datasets
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets
Ernest Perkowski
Boyao Wang
Tuan Dung Nguyen
Yuan-Sen Ting
Sandor Kruk
...
Michael J. Smith
Huiling Liu
Kevin Schawinski
K. Iyer
I. Ciucă
AI4MH
85
12
0
03 Jan 2024
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Zirui Liu
Chia-Yuan Chang
Huiyuan Chen
Helen Zhou
124
118
0
02 Jan 2024
Quokka: An Open-source Large Language Model ChatBot for Material Science
Quokka: An Open-source Large Language Model ChatBot for Material Science
Xianjun Yang
Stephen D. Wilson
Linda R. Petzold
OSLM
74
2
0
02 Jan 2024
ScatterFormer: Efficient Voxel Transformer with Scattered Linear
  Attention
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
Chenhang He
Ruihuang Li
Guowen Zhang
Lei Zhang
64
7
0
01 Jan 2024
Building Efficient Universal Classifiers with Natural Language Inference
Building Efficient Universal Classifiers with Natural Language Inference
Moritz Laurer
W. Atteveldt
Andreu Casas
Kasper Welbers
90
8
0
29 Dec 2023
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Jacob P. Portes
Alex Trott
Sam Havens
Daniel King
Abhinav Venigalla
Moin Nadeem
Nikhil Sardana
D. Khudia
Jonathan Frankle
104
18
0
29 Dec 2023
Spike No More: Stabilizing the Pre-training of Large Language Models
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
83
15
0
28 Dec 2023
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
  Time-Decoupled Training and Reusable Coop-Diffusion
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
Guansong Lu
Yuanfan Guo
Jianhua Han
Minzhe Niu
Yihan Zeng
Songcen Xu
Zeyi Huang
Zhao Zhong
Wei Zhang
Hang Xu
73
4
0
27 Dec 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
106
18
0
25 Dec 2023
Understanding the Potential of FPGA-Based Spatial Acceleration for Large
  Language Model Inference
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
117
40
0
23 Dec 2023
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
VLMDiffM
112
8
0
22 Dec 2023
Previous
123...222324...293031
Next