ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,439 papers shown
Title
Narrative Feature or Structured Feature? A Study of Large Language
  Models to Identify Cancer Patients at Risk of Heart Failure
Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure
Ziyi Chen
Mengyuan Zhang
M. M. Ahmed
Yi Guo
T. George
Jiang Bian
Yonghui Wu
33
2
0
18 Mar 2024
FastDecode: High-Throughput GPU-Efficient LLM Serving using
  Heterogeneous Pipelines
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
Jiaao He
Jidong Zhai
45
27
0
18 Mar 2024
NeoNeXt: Novel neural network operator and architecture based on the
  patch-wise matrix multiplications
NeoNeXt: Novel neural network operator and architecture based on the patch-wise matrix multiplications
Vladimir Korviakov
Denis Koposov
54
0
0
17 Mar 2024
Is Mamba Effective for Time Series Forecasting?
Is Mamba Effective for Time Series Forecasting?
Zihan Wang
Fanheng Kong
Shi Feng
Ming Wang
Xiaocui Yang
Han Zhao
Daling Wang
Yifei Zhang
Mamba
AI4TS
37
58
0
17 Mar 2024
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining
Tushar Kataria
Beatrice Knudsen
Shireen Y. Elhabian
DiffM
MedIm
37
9
0
17 Mar 2024
EfficientMorph: Parameter-Efficient Transformer-Based Architecture for
  3D Image Registration
EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration
Abu Zahid Bin Aziz
Mokshagna Sai Teja Karanam
Tushar Kataria
Shireen Y. Elhabian
ViT
MedIm
47
1
0
16 Mar 2024
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots
  Using Edge Devices
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices
Zhiyong Zhang
Huaizu Jiang
H. Singh
40
6
0
15 Mar 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
41
0
0
15 Mar 2024
FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical
  Images
FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images
Yiqing Shen
Jingxing Li
Xinyuan Shao
Blanca Inigo Romillo
Ankush Jindal
David Dreizin
Mathias Unberath
MedIm
50
12
0
14 Mar 2024
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot
Adrian Lañcucki
Marcin Chochowski
David Tarjan
Edoardo Ponti
41
50
0
14 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
73
0
14 Mar 2024
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning
  Researchers
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
Kaichao You
Runsheng Bai
Meng Cao
Jianmin Wang
Ion Stoica
Mingsheng Long
VLM
35
0
0
14 Mar 2024
BurstAttention: An Efficient Distributed Attention Framework for
  Extremely Long Sequences
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Sun Ao
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
GNN
40
8
0
14 Mar 2024
PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of
  Interest
PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest
Jiajun Deng
Sha Zhang
Feras Dayoub
Wanli Ouyang
Yanyong Zhang
Ian Reid
3DPC
54
4
0
14 Mar 2024
SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash
  Attention to Achieve 30 times Acceleration
SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration
Yanfei Song
Bangzheng Pu
Peng Wang
Hongxu Jiang
Dong Dong
Yongxiang Cao
Yiqing Shen
VLM
48
12
0
14 Mar 2024
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient
  Generative Inference
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
Muhammad Adnan
Akhil Arunkumar
Gaurav Jain
Prashant J. Nair
Ilya Soloveychik
Purushotham Kamath
41
52
0
14 Mar 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution
  in Large Language Models
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
39
1
0
13 Mar 2024
Bifurcated Attention: Accelerating Massively Parallel Decoding with
  Shared Prefixes in LLMs
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs
Ben Athiwaratkun
Sujan Kumar Gonugondla
Sanjay Krishna Gouda
Haifeng Qian
Hantian Ding
...
Liangfu Chen
Parminder Bhatia
Ramesh Nallapati
Sudipta Sengupta
Bing Xiang
59
4
0
13 Mar 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
108
42
0
13 Mar 2024
StreamingDialogue: Prolonged Dialogue Learning via Long Context
  Compression with Minimal Losses
StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses
Jia-Nan Li
Quan Tu
Cunli Mao
Zhengtao Yu
Ji-Rong Wen
Rui Yan
OffRL
34
3
0
13 Mar 2024
CHAI: Clustered Head Attention for Efficient LLM Inference
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal
Bilge Acun
Basil Homer
Mostafa Elhoushi
Yejin Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
Carole-Jean Wu
63
8
0
12 Mar 2024
Rethinking Generative Large Language Model Evaluation for Semantic
  Comprehension
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
Fangyun Wei
Xi Chen
Linzi Luo
ELM
ALM
LRM
38
7
0
12 Mar 2024
Characterization of Large Language Model Development in the Datacenter
Characterization of Large Language Model Development in the Datacenter
Qi Hu
Zhisheng Ye
Zerui Wang
Guoteng Wang
Mengdie Zhang
...
Dahua Lin
Xiaolin Wang
Yingwei Luo
Yonggang Wen
Tianwei Zhang
56
45
0
12 Mar 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
  Acceleration for Large Vision-Language Models
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLM
VLM
59
122
0
11 Mar 2024
RepoHyper: Better Context Retrieval Is All You Need for Repository-Level
  Code Completion
RepoHyper: Better Context Retrieval Is All You Need for Repository-Level Code Completion
Huy N. Phan
Hoang N. Phan
Tien N. Nguyen
Nghi D. Q. Bui
50
3
0
10 Mar 2024
Algorithmic progress in language models
Algorithmic progress in language models
Anson Ho
T. Besiroglu
Ege Erdil
David Owen
Robi Rahman
Zifan Carl Guo
David Atkinson
Neil Thompson
J. Sevilla
42
16
0
09 Mar 2024
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless
  Generative Inference of LLM
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Hao Kang
Qingru Zhang
Souvik Kundu
Geonhwa Jeong
Zaoxing Liu
Tushar Krishna
Tuo Zhao
MQ
49
80
0
08 Mar 2024
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like
  Speed
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Yifan Wang
Xingyi He He
Sida Peng
Dongli Tan
Xiaowei Zhou
3DV
41
43
0
07 Mar 2024
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self
  Attention at the Threadblock Level
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Ali Hassani
Wen-mei W. Hwu
Humphrey Shi
44
7
0
07 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
150
512
0
07 Mar 2024
Low-Resource Court Judgment Summarization for Common Law Systems
Low-Resource Court Judgment Summarization for Common Law Systems
Shuaiqi Liu
Jiannong Cao
Yicong Li
Ruosong Yang
Zhiyuan Wen
ELM
AILaw
34
2
0
07 Mar 2024
Mastering Memory Tasks with World Models
Mastering Memory Tasks with World Models
Mohammad Reza Samsami
Artem Zholus
Janarthanan Rajendran
Sarath Chandar
CLL
OffRL
44
24
0
07 Mar 2024
RATSF: Empowering Customer Service Volume Management through
  Retrieval-Augmented Time-Series Forecasting
RATSF: Empowering Customer Service Volume Management through Retrieval-Augmented Time-Series Forecasting
Tianfeng Wang
Gaojie Cui
AI4TS
54
0
0
07 Mar 2024
SaulLM-7B: A pioneering Large Language Model for Law
SaulLM-7B: A pioneering Large Language Model for Law
Pierre Colombo
T. Pires
Malik Boudiaf
Dominic Culver
Rui Melo
...
Andre F. T. Martins
Fabrizio Esposito
Vera Lúcia Raposo
Sofia Morgado
Michael Desa
ELM
AILaw
57
68
0
06 Mar 2024
AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs
AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs
Victor Akinwande
J. Zico Kolter
CML
33
1
0
06 Mar 2024
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large
  Language Models
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
Zexuan Qiu
Jingjing Li
Shijue Huang
Wanjun Zhong
Irwin King
ELM
ALM
58
3
0
06 Mar 2024
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
S. S. Mondal
Jonathan D. Cohen
Taylor W. Webb
OCL
45
9
0
06 Mar 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Wen-tau Yih
KELM
RALM
54
55
0
05 Mar 2024
TaylorShift: Shifting the Complexity of Self-Attention from Squared to
  Linear (and Back) using Taylor-Softmax
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen
Sebastián M. Palacio
Andreas Dengel
56
3
0
05 Mar 2024
Vision-Language Models for Medical Report Generation and Visual Question
  Answering: A Review
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
Iryna Hartsock
Ghulam Rasool
57
64
0
04 Mar 2024
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal
Nitin Kedia
Ashish Panwar
Jayashree Mohan
Nipun Kwatra
Bhargav S. Gulavani
Alexey Tumanov
Ramachandran Ramjee
55
165
0
04 Mar 2024
Large Language Model-Based Evolutionary Optimizer: Reasoning with
  elitism
Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism
Shuvayan Brahmachary
Subodh M. Joshi
Aniruddha Panda
K. Koneripalli
A. Sagotra
Harshil Patel
Ankush Sharma
Ameya Dilip Jagtap
Kaushic Kalyanaraman
LRM
65
19
0
04 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
46
44
0
04 Mar 2024
Respiratory motion forecasting with online learning of recurrent neural
  networks for safety enhancement in externally guided radiotherapy
Respiratory motion forecasting with online learning of recurrent neural networks for safety enhancement in externally guided radiotherapy
Michel Pohl
Mitsuru Uesaka
Hiroyuki Takahashi
K. Demachi
R. B. Chhatkuli
34
2
0
03 Mar 2024
On the Compressibility of Quantized Large Language Models
On the Compressibility of Quantized Large Language Models
Yu Mao
Weilan Wang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
36
6
0
03 Mar 2024
LM4OPT: Unveiling the Potential of Large Language Models in Formulating
  Mathematical Optimization Problems
LM4OPT: Unveiling the Potential of Large Language Models in Formulating Mathematical Optimization Problems
Tasnim Ahmed
Salimur Choudhury
33
11
0
02 Mar 2024
NoMAD-Attention: Efficient LLM Inference on CPUs Through
  Multiply-add-free Attention
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Tianyi Zhang
Jonah Yi
Bowen Yao
Zhaozhuo Xu
Anshumali Shrivastava
MQ
27
6
0
02 Mar 2024
Accelerating Greedy Coordinate Gradient via Probe Sampling
Accelerating Greedy Coordinate Gradient via Probe Sampling
Yiran Zhao
Wenyue Zheng
Tianle Cai
Xuan Long Do
Kenji Kawaguchi
Anirudh Goyal
Michael Shieh
51
11
0
02 Mar 2024
HeteGen: Heterogeneous Parallel Inference for Large Language Models on
  Resource-Constrained Devices
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
Xuanlei Zhao
Bin Jia
Hao Zhou
Ziming Liu
Shenggan Cheng
Yang You
34
4
0
02 Mar 2024
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
  Summaries
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Zelalem Gero
Chandan Singh
Yiqing Xie
Sheng Zhang
Tristan Naumann
Jianfeng Gao
Hoifung Poon
ELM
ALM
44
4
0
01 Mar 2024
Previous
123...171819...272829
Next