ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
LimGen: Probing the LLMs for Generating Suggestive Limitations of
  Research Papers
LimGen: Probing the LLMs for Generating Suggestive Limitations of Research Papers
Abdur Rahman Bin Md Faizullah
Ashok Urlana
Rahul Mishra
68
7
0
22 Mar 2024
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
Yunqi Zhu
Xuebing Yang
Yuanyuan Wu
Wensheng Zhang
124
3
0
22 Mar 2024
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
Vincent Tao Hu
S. A. Baumann
Ming Gui
Olga Grebenkova
Pingchuan Ma
Johannes S. Fischer
Bjorn Ommer
120
46
0
20 Mar 2024
Chain-of-Interaction: Enhancing Large Language Models for Psychiatric
  Behavior Understanding by Dyadic Contexts
Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts
Guangzeng Han
Weisi Liu
Xiaolei Huang
Brian Borsari
76
22
0
20 Mar 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
157
558
0
20 Mar 2024
Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks
Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks
Bo-Ru Lu
Nikita Haduong
Chien-Yu Lin
Hao Cheng
Noah A. Smith
Mari Ostendorf
AI4CE
73
0
0
19 Mar 2024
MELTing point: Mobile Evaluation of Language Transformers
MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis
Kleomenis Katevas
Lorenzo Minto
Hamed Haddadi
95
24
0
19 Mar 2024
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and
  mmWave Radar
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
Runwei Guan
Liye Jia
Fengyufan Yang
Shanliang Yao
Erick Purwanto
...
Eng Gee Lim
Jeremy S. Smith
Ka Lok Man
Xuming Hu
Yutao Yue
119
9
0
19 Mar 2024
HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free
  Matching
HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free Matching
Ying Chen
Yong-Jin Liu
Kai Wu
Qiang Nie
Shang Xu
Huifang Ma
Bing Wang
Chengjie Wang
VLM
66
1
0
19 Mar 2024
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data
  Flow and Per-Block Quantization
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi
Yuxiang Chen
Kang Zhao
Kaijun Zheng
Jianfei Chen
Jun Zhu
MQ
99
23
0
19 Mar 2024
HDLdebugger: Streamlining HDL debugging with Large Language Models
HDLdebugger: Streamlining HDL debugging with Large Language Models
Xufeng Yao
Haoyang Li
T. H. Chan
Wenyi Xiao
Mingxuan Yuan
Yu Huang
Lei Chen
Bei Yu
71
23
0
18 Mar 2024
Narrative Feature or Structured Feature? A Study of Large Language
  Models to Identify Cancer Patients at Risk of Heart Failure
Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure
Ziyi Chen
Mengyuan Zhang
M. M. Ahmed
Yi Guo
T. George
Jiang Bian
Yonghui Wu
79
2
0
18 Mar 2024
FastDecode: High-Throughput GPU-Efficient LLM Serving using
  Heterogeneous Pipelines
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
Jiaao He
Jidong Zhai
96
35
0
18 Mar 2024
NeoNeXt: Novel neural network operator and architecture based on the
  patch-wise matrix multiplications
NeoNeXt: Novel neural network operator and architecture based on the patch-wise matrix multiplications
Vladimir Korviakov
Denis Koposov
56
0
0
17 Mar 2024
Is Mamba Effective for Time Series Forecasting?
Is Mamba Effective for Time Series Forecasting?
Zihan Wang
Fanheng Kong
Shi Feng
Ming Wang
Xiaocui Yang
Han Zhao
Daling Wang
Yifei Zhang
MambaAI4TS
85
75
0
17 Mar 2024
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining
Tushar Kataria
Beatrice Knudsen
Shireen Y. Elhabian
DiffMMedIm
105
10
0
17 Mar 2024
EfficientMorph: Parameter-Efficient Transformer-Based Architecture for
  3D Image Registration
EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration
Abu Zahid Bin Aziz
Mokshagna Sai Teja Karanam
Tushar Kataria
Shireen Y. Elhabian
ViTMedIm
69
2
0
16 Mar 2024
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots
  Using Edge Devices
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices
Zhiyong Zhang
Huaizu Jiang
H. Singh
79
6
0
15 Mar 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
91
36
0
15 Mar 2024
FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical
  Images
FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images
Yiqing Shen
Jingxing Li
Xinyuan Shao
Blanca Inigo Romillo
Ankush Jindal
David Dreizin
Mathias Unberath
MedIm
88
15
0
14 Mar 2024
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot
Adrian Lañcucki
Marcin Chochowski
David Tarjan
Edoardo Ponti
94
56
0
14 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
135
78
0
14 Mar 2024
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning
  Researchers
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
Kaichao You
Runsheng Bai
Meng Cao
Jianmin Wang
Ion Stoica
Mingsheng Long
VLM
77
0
0
14 Mar 2024
BurstAttention: An Efficient Distributed Attention Framework for
  Extremely Long Sequences
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Sun Ao
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
GNN
69
8
0
14 Mar 2024
PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of
  Interest
PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest
Jiajun Deng
Sha Zhang
Feras Dayoub
Wanli Ouyang
Yanyong Zhang
Ian Reid
3DPC
124
4
0
14 Mar 2024
SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash
  Attention to Achieve 30 times Acceleration
SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration
Yanfei Song
Bangzheng Pu
Peng Wang
Hongxu Jiang
Dong Dong
Yongxiang Cao
Yiqing Shen
VLM
86
13
0
14 Mar 2024
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient
  Generative Inference
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
Muhammad Adnan
Akhil Arunkumar
Gaurav Jain
Prashant J. Nair
Ilya Soloveychik
Purushotham Kamath
112
62
0
14 Mar 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution
  in Large Language Models
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
62
1
0
13 Mar 2024
Bifurcated Attention: Accelerating Massively Parallel Decoding with
  Shared Prefixes in LLMs
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs
Ben Athiwaratkun
Sujan Kumar Gonugondla
Sanjay Krishna Gouda
Haifeng Qian
Hantian Ding
...
Liangfu Chen
Parminder Bhatia
Ramesh Nallapati
Sudipta Sengupta
Bing Xiang
88
4
0
13 Mar 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALMELMLRM
183
48
0
13 Mar 2024
StreamingDialogue: Prolonged Dialogue Learning via Long Context
  Compression with Minimal Losses
StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses
Jia-Nan Li
Quan Tu
Cunli Mao
Zhengtao Yu
Ji-Rong Wen
Rui Yan
OffRL
76
4
0
13 Mar 2024
CHAI: Clustered Head Attention for Efficient LLM Inference
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal
Bilge Acun
Basil Homer
Mostafa Elhoushi
Yejin Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
Carole-Jean Wu
107
11
0
12 Mar 2024
Rethinking Generative Large Language Model Evaluation for Semantic
  Comprehension
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
Fangyun Wei
Xi Chen
Linzi Luo
ELMALMLRM
63
8
0
12 Mar 2024
Characterization of Large Language Model Development in the Datacenter
Characterization of Large Language Model Development in the Datacenter
Qi Hu
Zhisheng Ye
Zerui Wang
Guoteng Wang
Mengdie Zhang
...
Dahua Lin
Xiaolin Wang
Yingwei Luo
Yonggang Wen
Tianwei Zhang
94
49
0
12 Mar 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
  Acceleration for Large Vision-Language Models
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLMVLM
121
155
0
11 Mar 2024
Algorithmic progress in language models
Algorithmic progress in language models
Anson Ho
T. Besiroglu
Ege Erdil
David Owen
Robi Rahman
Zifan Carl Guo
David Atkinson
Neil Thompson
J. Sevilla
67
18
0
09 Mar 2024
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless
  Generative Inference of LLM
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Hao Kang
Qingru Zhang
Souvik Kundu
Geonhwa Jeong
Zaoxing Liu
Tushar Krishna
Tuo Zhao
MQ
175
94
0
08 Mar 2024
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like
  Speed
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Yifan Wang
Xingyi He He
Sida Peng
Dongli Tan
Xiaowei Zhou
3DV
84
53
0
07 Mar 2024
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self
  Attention at the Threadblock Level
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Ali Hassani
Wen-mei W. Hwu
Humphrey Shi
66
9
0
07 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLMLRM
317
576
0
07 Mar 2024
Low-Resource Court Judgment Summarization for Common Law Systems
Low-Resource Court Judgment Summarization for Common Law Systems
Shuaiqi Liu
Jiannong Cao
Yicong Li
Ruosong Yang
Zhiyuan Wen
ELMAILaw
63
3
0
07 Mar 2024
Mastering Memory Tasks with World Models
Mastering Memory Tasks with World Models
Mohammad Reza Samsami
Artem Zholus
Janarthanan Rajendran
Sarath Chandar
CLLOffRL
102
28
0
07 Mar 2024
RATSF: Empowering Customer Service Volume Management through
  Retrieval-Augmented Time-Series Forecasting
RATSF: Empowering Customer Service Volume Management through Retrieval-Augmented Time-Series Forecasting
Tianfeng Wang
Gaojie Cui
AI4TS
99
0
0
07 Mar 2024
SaulLM-7B: A pioneering Large Language Model for Law
SaulLM-7B: A pioneering Large Language Model for Law
Pierre Colombo
T. Pires
Malik Boudiaf
Dominic Culver
Rui Melo
...
Andre F. T. Martins
Fabrizio Esposito
Vera Lúcia Raposo
Sofia Morgado
Michael Desa
ELMAILaw
114
75
0
06 Mar 2024
AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs
AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs
Victor Akinwande
J. Zico Kolter
CML
86
1
0
06 Mar 2024
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large
  Language Models
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
Zexuan Qiu
Jingjing Li
Shijue Huang
Wanjun Zhong
Irwin King
ELMALM
109
6
0
06 Mar 2024
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
S. S. Mondal
Jonathan D. Cohen
Taylor W. Webb
OCL
75
9
0
06 Mar 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Wen-tau Yih
KELMRALM
116
62
0
05 Mar 2024
TaylorShift: Shifting the Complexity of Self-Attention from Squared to
  Linear (and Back) using Taylor-Softmax
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen
Sebastián M. Palacio
Andreas Dengel
110
3
0
05 Mar 2024
Vision-Language Models for Medical Report Generation and Visual Question
  Answering: A Review
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
Iryna Hartsock
Ghulam Rasool
102
81
0
04 Mar 2024
Previous
123...192021...293031
Next