ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal
Nitin Kedia
Ashish Panwar
Jayashree Mohan
Nipun Kwatra
Bhargav S. Gulavani
Alexey Tumanov
Ramachandran Ramjee
102
187
0
04 Mar 2024
Large Language Model-Based Evolutionary Optimizer: Reasoning with
  elitism
Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism
Shuvayan Brahmachary
Subodh M. Joshi
Aniruddha Panda
K. Koneripalli
A. Sagotra
Harshil Patel
Ankush Sharma
Ameya Dilip Jagtap
Kaushic Kalyanaraman
LRM
112
22
0
04 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
91
49
0
04 Mar 2024
On the Compressibility of Quantized Large Language Models
On the Compressibility of Quantized Large Language Models
Yu Mao
Weilan Wang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
73
6
0
03 Mar 2024
LM4OPT: Unveiling the Potential of Large Language Models in Formulating
  Mathematical Optimization Problems
LM4OPT: Unveiling the Potential of Large Language Models in Formulating Mathematical Optimization Problems
Tasnim Ahmed
Salimur Choudhury
73
12
0
02 Mar 2024
NoMAD-Attention: Efficient LLM Inference on CPUs Through
  Multiply-add-free Attention
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Tianyi Zhang
Jonah Yi
Bowen Yao
Zhaozhuo Xu
Anshumali Shrivastava
MQ
104
7
0
02 Mar 2024
HeteGen: Heterogeneous Parallel Inference for Large Language Models on
  Resource-Constrained Devices
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
Xuanlei Zhao
Bin Jia
Hao Zhou
Ziming Liu
Shenggan Cheng
Yang You
36
5
0
02 Mar 2024
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
  Summaries
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Zelalem Gero
Chandan Singh
Yiqing Xie
Sheng Zhang
Tristan Naumann
Jianfeng Gao
Hoifung Poon
ELMALM
61
4
0
01 Mar 2024
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Vithursan Thangarasa
Mahmoud Salem
Shreyas Saxena
Kevin Leong
Joel Hestness
Sean Lie
MedIm
81
1
0
01 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
130
135
0
29 Feb 2024
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy,
  Advances, and Outlook
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook
Xingchen Zou
Yibo Yan
Xixuan Hao
Yuehong Hu
Haomin Wen
...
Junbo Zhang
Yong Li
Tianrui Li
Yu Zheng
Yuxuan Liang
HAIAI4TS
104
45
0
29 Feb 2024
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Rodrigo Santos
Joao Silva
Luís Gomes
João Rodrigues
António Branco
92
10
0
29 Feb 2024
CLLMs: Consistency Large Language Models
CLLMs: Consistency Large Language Models
Siqi Kou
Lanxiang Hu
Zhe He
Zhijie Deng
Hao Zhang
137
34
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
286
22
0
28 Feb 2024
Stable LM 2 1.6B Technical Report
Stable LM 2 1.6B Technical Report
Marco Bellagente
J. Tow
Dakota Mahan
Duy Phung
Maksym Zhuravinskyi
...
Paulo Rocha
Harry Saini
H. Teufel
Niccoló Zanichelli
Carlos Riquelme
OSLM
104
58
0
27 Feb 2024
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi
Runpei Dong
Shaochen Zhang
Haoran Geng
Chunrui Han
Zheng Ge
Li Yi
Kaisheng Ma
200
63
0
27 Feb 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
Evaluating Very Long-Term Conversational Memory of LLM Agents
A. Maharana
Dong-Ho Lee
Sergey Tulyakov
Mohit Bansal
Francesco Barbieri
Yuwei Fang
LLMAG
86
81
0
27 Feb 2024
AmbigNLG: Addressing Task Ambiguity in Instruction for NLG
AmbigNLG: Addressing Task Ambiguity in Instruction for NLG
Ayana Niwa
Hayate Iso
90
5
0
27 Feb 2024
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain
  Question Answering
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering
Yuhao Wang
Ruiyang Ren
Junyi Li
Wayne Xin Zhao
Jing Liu
Ji-Rong Wen
RALM
106
13
0
27 Feb 2024
Training-Free Long-Context Scaling of Large Language Models
Training-Free Long-Context Scaling of Large Language Models
Chen An
Fei Huang
Jun Zhang
Shansan Gong
Xipeng Qiu
Chang Zhou
Lingpeng Kong
ALMLRM
104
42
0
27 Feb 2024
Actions Speak Louder than Words: Trillion-Parameter Sequential
  Transducers for Generative Recommendations
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Jiaqi Zhai
Lucy Liao
Xing Liu
Yueming Wang
Rui Li
...
Zhaojie Gong
Fangda Gu
Michael He
Yin-Hua Lu
Yu Shi
OffRL
102
56
0
27 Feb 2024
Investigating the Effectiveness of HyperTuning via Gisting
Investigating the Effectiveness of HyperTuning via Gisting
Jason Phang
104
1
0
26 Feb 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
126
57
0
26 Feb 2024
Look Before You Leap: Towards Decision-Aware and Generalizable
  Tool-Usage for Large Language Models
Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models
Anchun Gui
Jian Li
Yong Dai
Nan Du
Han Xiao
45
1
0
26 Feb 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
233
91
0
26 Feb 2024
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
Han Zou
Qiyang Zhao
Lina Bariah
Yu Tian
M. Bennis
S. Lasaulce
156
14
0
26 Feb 2024
Seamless Human Motion Composition with Blended Positional Encodings
Seamless Human Motion Composition with Blended Positional Encodings
German Barquero
Sergio Escalera
Cristina Palmero
DiffM
91
34
0
23 Feb 2024
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and
  Two-Phase Partition
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Lu Ye
Ze Tao
Yong Huang
Yang Li
97
34
0
23 Feb 2024
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao
Sizhe Dang
Haishan Ye
Guang Dai
Yi Qian
Ivor W.Tsang
151
13
0
23 Feb 2024
MobileLLM: Optimizing Sub-billion Parameter Language Models for
  On-Device Use Cases
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Zechun Liu
Changsheng Zhao
Forrest N. Iandola
Chen Lai
Yuandong Tian
...
Ernie Chang
Yangyang Shi
Raghuraman Krishnamoorthi
Liangzhen Lai
Vikas Chandra
ALM
137
102
0
22 Feb 2024
RelayAttention for Efficient Large Language Model Serving with Long
  System Prompts
RelayAttention for Efficient Large Language Model Serving with Long System Prompts
Lei Zhu
Xinjiang Wang
Wayne Zhang
Rynson W. H. Lau
88
8
0
22 Feb 2024
Improving Language Understanding from Screenshots
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
82
10
0
21 Feb 2024
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Shanchuan Lin
Anran Wang
Xiao Yang
147
134
0
21 Feb 2024
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens
∞\infty∞Bench: Extending Long Context Evaluation Beyond 100K Tokens
Xinrong Zhang
Yingfa Chen
Shengding Hu
Zihang Xu
Junhao Chen
...
Xu Han
Zhen Leng Thai
Shuo Wang
Zhiyuan Liu
Maosong Sun
RALMLRM
112
195
0
21 Feb 2024
ToDo: Token Downsampling for Efficient Generation of High-Resolution
  Images
ToDo: Token Downsampling for Efficient Generation of High-Resolution Images
Ethan Smith
Nayan Saxena
Aninda Saha
DiffM
45
6
0
21 Feb 2024
CAMELoT: Towards Large Language Models with Training-Free Consolidated
  Associative Memory
CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory
Zexue He
Leonid Karlinsky
Donghyun Kim
Julian McAuley
Dmitry Krotov
Rogerio Feris
KELMRALM
84
11
0
21 Feb 2024
How do Hyenas deal with Human Speech? Speech Recognition and Translation
  with ConfHyena
How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
100
1
0
20 Feb 2024
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for
  Single or Sparse-view 3D Object Reconstruction
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Shitao Tang
Jiacheng Chen
Dilin Wang
Chengzhou Tang
Fuyang Zhang
Yuchen Fan
Vikas Chandra
Yasutaka Furukawa
Rakesh Ranjan
146
75
0
20 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
126
25
0
20 Feb 2024
Locality-Sensitive Hashing-Based Efficient Point Transformer with
  Applications in High-Energy Physics
Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics
Siqi Miao
Zhiyuan Lu
Mia Liu
Javier Duarte
Pan Li
127
6
0
19 Feb 2024
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference
  Dataset and Modular Fine-tuning Schema
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
Junru Lu
Siyu An
Min Zhang
Yulan He
Di Yin
Xing Sun
127
2
0
19 Feb 2024
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Zhuoming Chen
Avner May
Ruslan Svirschevski
Yuhsun Huang
Max Ryabinin
Zhihao Jia
Beidi Chen
104
52
0
19 Feb 2024
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for
  Ultra-Low-Parameter Fine-Tuning of Large Language Models
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
Yifan Yang
Jiajun Zhou
Ngai Wong
Zheng Zhang
73
8
0
18 Feb 2024
Language Models as Science Tutors
Language Models as Science Tutors
Alexis Chevalier
Jiayi Geng
Alexander Wettig
Howard Chen
Sebastian Mizera
...
Jiatong Yu
Jun-Jie Zhu
Z. Ren
Sanjeev Arora
Danqi Chen
ELM
72
13
0
16 Feb 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM
  Instruction-Tuning
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Jiuxiang Gu
Dinesh Manocha
149
58
0
15 Feb 2024
Multi-word Tokenization for Sequence Compression
Multi-word Tokenization for Sequence Compression
Leonidas Gee
Leonardo Rigutini
Marco Ernandes
Andrea Zugarini
65
10
0
15 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
116
58
0
15 Feb 2024
InstructGraph: Boosting Large Language Models via Graph-centric
  Instruction Tuning and Preference Alignment
InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment
Jianing Wang
Junda Wu
Yupeng Hou
Yao Liu
Ming Gao
Julian McAuley
96
35
0
13 Feb 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
140
85
0
13 Feb 2024
FAST: Factorizable Attention for Speeding up Transformers
FAST: Factorizable Attention for Speeding up Transformers
Armin Gerami
Monte Hoover
P. S. Dulepet
R. Duraiswami
37
0
0
12 Feb 2024
Previous
123...202122...293031
Next