ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,510 papers shown
Title
Libra: Building Decoupled Vision System on Large Language Models
Libra: Building Decoupled Vision System on Large Language Models
Yifan Xu
Xiaoshan Yang
Y. Song
Changsheng Xu
MLLMVLM
94
8
0
16 May 2024
An Embarrassingly Simple Approach to Enhance Transformer Performance in
  Genomic Selection for Crop Breeding
An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding
Renqi Chen
Wenwei Han
Haohao Zhang
Haoyang Su
Zhefan Wang
Xiaolei Liu
Hao Jiang
Wanli Ouyang
Nanqing Dong
24
1
0
15 May 2024
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with
  Fine-Grained Chinese Understanding
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Zhimin Li
Jianwei Zhang
Qin Lin
Jiangfeng Xiong
Yanxin Long
...
Wei Liu
Dingyong Wang
Yong Yang
Jie Jiang
Qinglin Lu
ViT
139
120
0
14 May 2024
Investigating Design Choices in Joint-Embedding Predictive Architectures
  for General Audio Representation Learning
Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning
Alain Riou
Stefan Lattner
Gaëtan Hadjeres
Geoffroy Peeters
69
2
0
14 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
58
4
0
14 May 2024
Self-Distillation Improves DNA Sequence Inference
Self-Distillation Improves DNA Sequence Inference
Tong Yu
Lei Cheng
Ruslan Khalitov
Erland Brandser Olsson
Zhirong Yang
SyDa
80
1
0
14 May 2024
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species
  Genomic Sequence Modeling
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling
Siyuan Li
Zedong Wang
Zicheng Liu
Di Wu
Cheng Tan
Jiangbin Zheng
Yufei Huang
Stan Z. Li
75
8
0
13 May 2024
USP: A Unified Sequence Parallelism Approach for Long Context Generative
  AI
USP: A Unified Sequence Parallelism Approach for Long Context Generative AI
Jiarui Fang
Shangchun Zhao
102
24
0
13 May 2024
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating
  Large Language Models
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
Yunsheng Ni
Chuanjian Liu
Yehui Tang
Kai Han
Yunhe Wang
107
1
0
13 May 2024
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and
  Composition of Experts
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
R. Prabhakar
R. Sivaramakrishnan
Darshan Gandhi
Yun Du
Mingran Wang
...
Urmish Thakker
Dawei Huang
Sumti Jairath
Kevin J. Brown
K. Olukotun
MoE
77
15
0
13 May 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
126
0
0
13 May 2024
CaFA: Global Weather Forecasting with Factorized Attention on Sphere
CaFA: Global Weather Forecasting with Factorized Attention on Sphere
Zijie Li
Anthony Zhou
Saurabh Patil
A. Farimani
94
6
0
12 May 2024
Smurfs: Multi-Agent System using Context-Efficient DFSDT for Tool Planning
Smurfs: Multi-Agent System using Context-Efficient DFSDT for Tool Planning
Junzhi Chen
Juhao Liang
Benyou Wang
LLMAG
83
4
0
09 May 2024
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity
Zhufeng Li
S. S. Cranganore
Nicholas D. Youngblut
Niki Kilbertus
117
2
0
09 May 2024
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache
  Generation
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Minsik Cho
Mohammad Rastegari
Devang Naik
78
4
0
08 May 2024
Bridging the Bosphorus: Advancing Turkish Large Language Models through
  Strategies for Low-Resource Language Adaptation and Benchmarking
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking
Emre Can Acikgoz
Mete Erdogan
Deniz Yuret
80
8
0
07 May 2024
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
164
29
0
07 May 2024
Granite Code Models: A Family of Open Foundation Models for Code
  Intelligence
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Mayank Mishra
Matt Stallone
Gaoyuan Zhang
Songlin Yang
Aditya Prasad
...
Amith Singhee
Nirmit Desai
David D. Cox
Ruchir Puri
Yikang Shen
AI4TS
130
74
0
07 May 2024
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference
  with Coupled Quantization
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
Tianyi Zhang
Jonah Yi
Zhaozhuo Xu
Anshumali Shrivastava
MQ
68
32
0
07 May 2024
Data-Efficient Molecular Generation with Hierarchical Textual Inversion
Data-Efficient Molecular Generation with Hierarchical Textual Inversion
Seojin Kim
Jaehyun Nam
Sihyun Yu
Younghoon Shin
Jinwoo Shin
129
3
0
05 May 2024
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
Yuzhen Mao
Martin Ester
Ke Li
81
6
0
05 May 2024
Is Flash Attention Stable?
Is Flash Attention Stable?
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
67
5
0
05 May 2024
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token
  Sampling
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling
Shikhar Tuli
Chi-Heng Lin
Yen-Chang Hsu
N. Jha
Yilin Shen
Hongxia Jin
AI4CE
50
3
0
01 May 2024
DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries
  Cryptocurrency Trend Forecasting
DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries Cryptocurrency Trend Forecasting
Yihang Fu
Mingyu Zhou
Luyao Zhang
AI4TS
86
6
0
01 May 2024
Lightplane: Highly-Scalable Components for Neural 3D Fields
Lightplane: Highly-Scalable Components for Neural 3D Fields
Ang Cao
Justin Johnson
Andrea Vedaldi
David Novotny
99
9
0
30 Apr 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized
  Transformers
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
102
4
0
30 Apr 2024
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting
  Human Language Comprehension Metrics
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
J. Michaelov
Catherine Arnett
Benjamin Bergen
82
4
0
30 Apr 2024
Building a Large Japanese Web Corpus for Large Language Models
Building a Large Japanese Web Corpus for Large Language Models
Naoaki Okazaki
Kakeru Hattori
Hirai Shota
Hiroki Iida
Masanari Ohi
Kazuki Fujii
Taishi Nakamura
Mengsay Loem
Rio Yokota
Sakae Mizuki
110
7
0
27 Apr 2024
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text
  Streaming Services
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services
Jiachen Liu
Zhiyu Wu
Jae-Won Chung
Fan Lai
Myungjin Lee
Mosharaf Chowdhury
96
29
0
25 Apr 2024
CORM: Cache Optimization with Recent Message for Large Language Model
  Inference
CORM: Cache Optimization with Recent Message for Large Language Model Inference
Jincheng Dai
Zhuowei Huang
Haiyun Jiang
Chen Chen
Deng Cai
Wei Bi
Shuming Shi
109
3
0
24 Apr 2024
Nyonic Technical Report
Nyonic Technical Report
Junfeng Tian
Rui Wang
Cong Li
Yudong Zhou
Jun Liu
Jun Wang
58
1
0
24 Apr 2024
Automated Creation of Source Code Variants of a Cryptographic Hash
  Function Implementation Using Generative Pre-Trained Transformer Models
Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models
Elijah Pelofske
Vincent Urias
L. Liebrock
86
0
0
24 Apr 2024
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
  Pre-training on Web-scale Image-Text Data
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Sachin Mehta
Maxwell Horton
Fartash Faghri
Mohammad Hossein Sekhavat
Mahyar Najibi
Mehrdad Farajtabar
Oncel Tuzel
Mohammad Rastegari
VLMCLIP
69
7
0
24 Apr 2024
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Jo˜ao Monteiro
Étienne Marcotte
Pierre-Andre Noel
Valentina Zantedeschi
David Vázquez
Nicolas Chapados
Christopher Pal
Perouz Taslakian
77
5
0
23 Apr 2024
Automated Multi-Language to English Machine Translation Using Generative
  Pre-Trained Transformers
Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers
Elijah Pelofske
Vincent Urias
L. Liebrock
87
0
0
23 Apr 2024
OpenELM: An Efficient Language Model Family with Open Training and
  Inference Framework
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Sachin Mehta
Mohammad Hossein Sekhavat
Qingqing Cao
Maxwell Horton
Yanzi Jin
...
Iman Mirzadeh
Mahyar Najibi
Dmitry Belenko
Peter Zatloukal
Mohammad Rastegari
OSLMAIFin
108
61
0
22 Apr 2024
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU
  Heterogeneity
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
116
18
0
22 Apr 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
74
4
0
22 Apr 2024
SnapKV: LLM Knows What You are Looking for Before Generation
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
143
210
0
22 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
171
98
0
22 Apr 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRMALM
197
1,274
0
22 Apr 2024
SmartMem: Layout Transformation Elimination and Adaptation for Efficient
  DNN Execution on Mobile
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
Wei Niu
Md. Musfiqur Rahman Sanim
Zhihao Shu
Jiexiong Guan
Xipeng Shen
Miao Yin
Gagan Agrawal
Bin Ren
69
6
0
21 Apr 2024
Large Language Models for Next Point-of-Interest Recommendation
Large Language Models for Next Point-of-Interest Recommendation
Peibo Li
Maarten de Rijke
Hao Xue
Shuang Ao
Yang Song
Flora D. Salim
138
34
0
19 Apr 2024
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware
  State Space Model
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model
Kang Zeng
Haowen Shi
Jiacheng Lin
Siyu Li
Jintao Cheng
Kaiwei Wang
Zhiyong Li
Kailun Yang
Mamba
84
8
0
19 Apr 2024
EdgeFusion: On-Device Text-to-Image Generation
EdgeFusion: On-Device Text-to-Image Generation
Thibault Castells
Hyoung-Kyu Song
Tairen Piao
Shinkook Choi
Bo-Kyeong Kim
Hanyoung Yim
Changgwun Lee
Jae Gon Kim
Tae-Ho Kim
VLM
69
6
0
18 Apr 2024
TriForce: Lossless Acceleration of Long Sequence Generation with
  Hierarchical Speculative Decoding
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun
Zhuoming Chen
Xinyu Yang
Yuandong Tian
Beidi Chen
121
65
0
18 Apr 2024
Sequence Length Scaling in Vision Transformers for Scientific Images on
  Frontier
Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier
A. Tsaris
Chengming Zhang
Xiao Wang
Junqi Yin
Siyan Liu
...
Jong Youl Choi
Mohamed Wahib
Dan Lu
Prasanna Balaprakash
Feiyi Wang
44
1
0
17 Apr 2024
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large
  Language Models
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models
Yushuo Chen
Tianyi Tang
Erge Xiang
Linjiang Li
Wayne Xin Zhao
Jing Wang
Yunpeng Chai
Ji-Rong Wen
22
2
0
17 Apr 2024
In-Context Learning State Vector with Inner and Momentum Optimization
In-Context Learning State Vector with Inner and Momentum Optimization
Dongfang Li
Zhenyu Liu
Xinshuo Hu
Zetian Sun
Baotian Hu
Min Zhang
100
8
0
17 Apr 2024
ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours
ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours
Feiwen Zhu
Arkadiusz Nowaczynski
Rundong Li
Jie Xin
Yifei Song
Michal Marcinkiewicz
S. Eryilmaz
June Yang
M. Andersch
71
5
0
17 Apr 2024
Previous
123...171819...293031
Next