ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
Suppressing Pink Elephants with Direct Principle Feedback
Suppressing Pink Elephants with Direct Principle Feedback
Louis Castricato
Nathan Lile
Suraj Anand
Hailey Schoelkopf
Siddharth Verma
Stella Biderman
104
12
0
12 Feb 2024
Anchor-based Large Language Models
Anchor-based Large Language Models
Jianhui Pang
Fanghua Ye
Derek F. Wong
Xin He
Wanshun Chen
Longyue Wang
KELM
156
10
0
12 Feb 2024
The I/O Complexity of Attention, or How Optimal is Flash Attention?
The I/O Complexity of Attention, or How Optimal is Flash Attention?
Barna Saha
Christopher Ye
58
5
0
12 Feb 2024
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning
  Framework for Dialogue
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue
Jian Wang
Chak Tou Leong
Jiashuo Wang
Dongding Lin
Wenjie Li
Xiao-Yong Wei
85
9
0
10 Feb 2024
On the Efficacy of Eviction Policy for Key-Value Constrained Generative
  Language Model Inference
On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference
Siyu Ren
Kenny Q. Zhu
75
30
0
09 Feb 2024
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù
Zdeněk Kasner
Siva Reddy
98
77
0
08 Feb 2024
Memory Consolidation Enables Long-Context Video Understanding
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
195
27
0
08 Feb 2024
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for
  Transformers
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Aakriti Jain
Thomas Wiegand
Sebastian Lapuschkin
Wojciech Samek
84
37
0
08 Feb 2024
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Jordan Juravsky
Bradley Brown
Ryan Ehrlich
Daniel Y. Fu
Christopher Ré
Azalia Mirhoseini
131
39
0
07 Feb 2024
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and
  Lattice Codebooks
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng
Jerry Chee
Qingyao Sun
Volodymyr Kuleshov
Christopher De Sa
MQ
215
129
0
06 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
  Mimicry
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
80
54
0
06 Feb 2024
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Quan-Sen Sun
Jinsheng Wang
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Xinlong Wang
VLMCLIPMLLM
137
49
0
06 Feb 2024
CAST: Clustering Self-Attention using Surrogate Tokens for Efficient
  Transformers
CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers
Adjorn van Engelenhoven
Nicola Strisciuglio
Estefanía Talavera
109
1
0
06 Feb 2024
Multi-line AI-assisted Code Authoring
Multi-line AI-assisted Code Authoring
Omer Dunay
Daniel Cheng
Adam Tait
Parth Thakkar
Peter C. Rigby
...
Arun Ganesan
C. Maddila
V. Murali
Ali Tayyebi
Nachiappan Nagappan
KELM
155
15
0
06 Feb 2024
Return-Aligned Decision Transformer
Return-Aligned Decision Transformer
Tsunehiko Tanaka
Kenshi Abe
Kaito Ariu
Tetsuro Morimura
Edgar Simo-Serra
OffRL
176
1
0
06 Feb 2024
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
  LLMs
ReLU2^22 Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang
Yixin Song
Guanghui Yu
Xu Han
Yankai Lin
Chaojun Xiao
Chenyang Song
Zhiyuan Liu
Zeyu Mi
Maosong Sun
80
36
0
06 Feb 2024
Progress and Opportunities of Foundation Models in Bioinformatics
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li
Zhihang Hu
Yixuan Wang
Lei Li
Yimin Fan
Irwin King
Le Song
Yu Li
AI4CE
85
18
0
06 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
152
35
0
05 Feb 2024
Key-Graph Transformer for Image Restoration
Key-Graph Transformer for Image Restoration
Bin Ren
Yawei Li
Christos Sakaridis
Rakesh Ranjan
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
N. Sebe
134
1
0
04 Feb 2024
Learning to Understand: Identifying Interactions via the Möbius
  Transform
Learning to Understand: Identifying Interactions via the Möbius Transform
Justin Singh Kang
Yigit Efe Erginbas
Landon Butler
Ramtin Pedarsani
Kannan Ramchandran
92
4
0
04 Feb 2024
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Razvan-Gabriel Dumitru
Darius Peteleaza
Mihai Surdeanu
AI4TS
41
2
0
04 Feb 2024
DenseFormer: Enhancing Information Flow in Transformers via Depth
  Weighted Averaging
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Matteo Pagliardini
Amirkeivan Mohtashami
François Fleuret
Martin Jaggi
106
9
0
04 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length
  in Large Language Models
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
91
48
0
03 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead
  Decoding
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
202
164
0
03 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
304
570
0
02 Feb 2024
Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Arnav Chavan
Raghav Magazine
Shubham Kushwaha
M. Debbah
Deepak Gupta
68
23
0
02 Feb 2024
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
Mohammadreza Pourreza
Davood Rafiei
72
30
0
02 Feb 2024
Compositional Generative Modeling: A Single Model is Not All You Need
Compositional Generative Modeling: A Single Model is Not All You Need
Yilun Du
L. Kaelbling
PINNGAN
142
25
0
02 Feb 2024
Nomic Embed: Training a Reproducible Long Context Text Embedder
Nomic Embed: Training a Reproducible Long Context Text Embedder
Zach Nussbaum
John X. Morris
Brandon Duderstadt
Andriy Mulyar
116
124
0
02 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at
  Copying
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
176
95
0
01 Feb 2024
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight
  in the Real World for Meeting Summarization?
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?
Xue-Yong Fu
Md Tahmid Rahman Laskar
Elena Khasanova
Cheng-Hsiung Chen
TN ShashiBhushan
ALM
96
23
0
01 Feb 2024
Hybrid Quantum Vision Transformers for Event Classification in High
  Energy Physics
Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics
Eyup B. Unlu
Marçal Comajoan Cara
Gopal Ramesh Dahale
Zhongtian Dong
Roy T. Forestano
...
Daniel Justice
Kyoungchul Kong
Tom Magorsch
Konstantin T. Matchev
Katia Matcheva
94
12
0
01 Feb 2024
Comparative Study of Large Language Model Architectures on Frontier
Comparative Study of Large Language Model Architectures on Frontier
Shantia Yarahmadian
A. Bose
Guojing Cong
Richard Yamada
Quentin Anthony
ELM
83
7
0
01 Feb 2024
Superfiltering: Weak-to-Strong Data Filtering for Fast
  Instruction-Tuning
Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
Ming Li
Yong Zhang
Shwai He
Zhitao Li
Hongyu Zhao
Jianzong Wang
Ning Cheng
Dinesh Manocha
102
80
0
01 Feb 2024
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Parth Sarthi
Salman Abdullah
Aditi Tuli
Shubh Khanna
Anna Goldie
Christopher D. Manning
RALM
109
148
0
31 Jan 2024
LongAlign: A Recipe for Long Context Alignment of Large Language Models
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Yushi Bai
Xin Lv
Jiajie Zhang
Yuze He
Ji Qi
Lei Hou
Jie Tang
Yuxiao Dong
Juanzi Li
ALM
100
53
0
31 Jan 2024
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
Shiwei Liu
Guanchen Tao
Yifei Zou
Derek Chow
Zichen Fan
Kauna Lei
Bangfei Pan
Dennis Sylvester
Gregory Kielian
Mehdi Saligane
69
8
0
31 Jan 2024
LOCOST: State-Space Models for Long Document Abstractive Summarization
LOCOST: State-Space Models for Long Document Abstractive Summarization
Florian Le Bronnec
Song Duong
Mathieu Ravaut
Alexandre Allauzen
Nancy F. Chen
Vincent Guigue
Alberto Lumbreras
Laure Soulier
Patrick Gallinari
139
10
0
31 Jan 2024
Weaver: Foundation Models for Creative Writing
Weaver: Foundation Models for Creative Writing
Tiannan Wang
Jiamin Chen
Qingrui Jia
Shuai Wang
Ruoyu Fang
...
Xiaohua Xu
Ningyu Zhang
Huajun Chen
Yuchen Eleanor Jiang
Wangchunshu Zhou
99
20
0
30 Jan 2024
YTCommentQA: Video Question Answerability in Instructional Videos
YTCommentQA: Video Question Answerability in Instructional Videos
Saelyne Yang
Sunghyun Park
Yunseok Jang
Moontae Lee
114
3
0
30 Jan 2024
H2O-Danube-1.8B Technical Report
H2O-Danube-1.8B Technical Report
Philipp Singer
Pascal Pfeiffer
Yauhen Babakhin
Maximilian Jeblick
Nischay Dhankhar
Gabor Fodor
SriSatish Ambati
VLM
60
8
0
30 Jan 2024
T3: Transparent Tracking & Triggering for Fine-grained Overlap of
  Compute & Collectives
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
Suchita Pati
Shaizeen Aga
Mahzabeen Islam
Nuwan Jayasena
Matthew D. Sinclair
75
16
0
30 Jan 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on
  E-Branchformer
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng
Jinchuan Tian
William Chen
Siddhant Arora
Brian Yan
...
Kwanghee Choi
Jiatong Shi
Xuankai Chang
Jee-weon Jung
Shinji Watanabe
VLMOSLM
103
54
0
30 Jan 2024
TeenyTinyLlama: open-source tiny language models trained in Brazilian
  Portuguese
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
N. Corrêa
Sophia Falk
Shiza Fatimah
Aniket Sen
N. D. Oliveira
89
9
0
30 Jan 2024
Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models
Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models
Zhongjie Duan
Chengyu Wang
Cen Chen
Weining Qian
Jun Huang
DiffM
51
7
0
29 Jan 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun
Youngmin Ro
ViT
136
36
0
29 Jan 2024
Hardware Phi-1.5B: A Large Language Model Encodes Hardware Domain
  Specific Knowledge
Hardware Phi-1.5B: A Large Language Model Encodes Hardware Domain Specific Knowledge
Weimin Fu
Shijie Li
Yifang Zhao
Haocheng Ma
R. Dutta
Xuan Zhang
Kaichen Yang
Yier Jin
Xiaolong Guo
ALM
98
10
0
27 Jan 2024
Improving Medical Reasoning through Retrieval and Self-Reflection with
  Retrieval-Augmented Large Language Models
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models
Minbyul Jeong
Jiwoong Sohn
Mujeen Sung
Jaewoo Kang
120
34
0
27 Jan 2024
PROXYQA: An Alternative Framework for Evaluating Long-Form Text
  Generation with Large Language Models
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models
Haochen Tan
Zhijiang Guo
Zhan Shi
Lu Xu
Zhili Liu
...
Xiaoguang Li
Yasheng Wang
Lifeng Shang
Qun Liu
Linqi Song
101
16
0
26 Jan 2024
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness
Samaneh Shafee
A. Bessani
Pedro M. Ferreira
57
22
0
26 Jan 2024
Previous
123...212223...293031
Next