ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
PointLLM: Empowering Large Language Models to Understand Point Clouds
PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu
Xiaolong Wang
Tai Wang
Yilun Chen
Jiangmiao Pang
Dahua Lin
MLLM
129
185
0
31 Aug 2023
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked
  Prefills
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Amey Agrawal
Ashish Panwar
Jayashree Mohan
Nipun Kwatra
Bhargav S. Gulavani
Ramachandran Ramjee
AI4TSLRM
100
110
0
31 Aug 2023
A General-Purpose Self-Supervised Model for Computational Pathology
A General-Purpose Self-Supervised Model for Computational Pathology
Richard J. Chen
Tong Ding
Ming Y. Lu
Drew F. K. Williamson
Guillaume Jaume
...
Judy J. Wang
Walt Williams
L. Le
Georg Gerber
Faisal Mahmood
MedIm
134
44
0
29 Aug 2023
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Qingyue Wang
Y. Fu
Yanan Cao
Zhiliang Tian
Shi Wang
Dacheng Tao
LLMAGKELMRALM
167
29
0
29 Aug 2023
Multiscale Contextual Learning for Speech Emotion Recognition in
  Emergency Call Center Conversations
Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations
Théo Deschamps-Berger
L. Lamel
Laurence Devillers
56
2
0
28 Aug 2023
MedAlign: A Clinician-Generated Dataset for Instruction Following with
  Electronic Medical Records
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Scott L. Fleming
Alejandro Lozano
W. Haberkorn
Jenelle A. Jindal
E. Reis
...
Jonathan Chen
Keith Morse
Emma Brunskill
Jason Alan Fries
N. Shah
LM&MA
83
60
0
27 Aug 2023
Aligning Language Models with Offline Learning from Human Feedback
Aligning Language Models with Offline Learning from Human Feedback
Jian Hu
Li Tao
J. Yang
Chandler Zhou
ALMOffRL
90
7
0
23 Aug 2023
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data
  Selection for Instruction Tuning
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
Ming Li
Yong Zhang
Zhitao Li
Jiuhai Chen
Lichang Chen
Ning Cheng
Jianzong Wang
Dinesh Manocha
Jing Xiao
129
211
0
23 Aug 2023
How Much Temporal Long-Term Context is Needed for Action Segmentation?
How Much Temporal Long-Term Context is Needed for Action Segmentation?
Emad Bahrami Rad
Gianpiero Francesca
Juergen Gall
ViT
87
27
0
22 Aug 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Leilei Gan
Guoyin Wang
LM&MA
108
609
0
21 Aug 2023
LegalBench: A Collaboratively Built Benchmark for Measuring Legal
  Reasoning in Large Language Models
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Neel Guha
Julian Nyarko
Daniel E. Ho
Christopher Ré
Adam Chilton
...
Spencer Williams
Sunny G. Gandhi
Tomer Zur
Varun J. Iyer
Zehua Li
AILawLRMELM
82
182
0
20 Aug 2023
LMTuner: An user-friendly and highly-integrable Training Framework for
  fine-tuning Large Language Models
LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models
Yixuan Weng
Zhiqi Wang
Huanxuan Liao
Shizhu He
Shengping Liu
Kang Liu
Jun Zhao
64
3
0
20 Aug 2023
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain
  Conversation
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Junru Lu
Siyu An
Mingbao Lin
Gabriele Pergola
Yulan He
Di Yin
Xing Sun
Yunsheng Wu
127
40
0
16 Aug 2023
OctoPack: Instruction Tuning Code Large Language Models
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLMALM
146
140
0
14 Aug 2023
Pairing interacting protein sequences using masked language modeling
Pairing interacting protein sequences using masked language modeling
Umberto Lupo
Damiano Sgarbossa
Anne-Florence Bitbol
50
13
0
14 Aug 2023
Large Language Models for Telecom: Forthcoming Impact on the Industry
Large Language Models for Telecom: Forthcoming Impact on the Industry
Ali Maatouk
Nicola Piovesan
Fadhel Ayed
Antonio De Domenico
Merouane Debbah
90
56
0
11 Aug 2023
Accelerating LLM Inference with Staged Speculative Decoding
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
86
113
0
08 Aug 2023
Continual Pre-Training of Large Language Models: How to (re)warm your
  model?
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats L. Richter
Quentin G. Anthony
Eugene Belilovsky
Irina Rish
Timothée Lesort
KELM
127
107
0
08 Aug 2023
DiffSynth: Latent In-Iteration Deflickering for Realistic Video
  Synthesis
DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis
Zhongjie Duan
Lizhou You
Chengyu Wang
Cen Chen
Ziheng Wu
Weining Qian
Jun Huang
DiffM
86
8
0
07 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
76
3
0
07 Aug 2023
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models
  Fine-tuning
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
Longteng Zhang
Lin Zhang
Shaoshuai Shi
Xiaowen Chu
Yue Liu
AI4CE
72
107
0
07 Aug 2023
Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for
  Super-Resolution
Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution
Yong Liu
Hang Dong
Boyang Liang
Song Liu
Qingji Dong
Kai Chen
Fangmin Chen
Lean Fu
Fei Wang
ViT
51
14
0
05 Aug 2023
JIANG: Chinese Open Foundation Language Model
JIANG: Chinese Open Foundation Language Model
Qinhua Duan
Wenchao Gu
Yujia Chen
Wenxin Mao
Zewen Tian
Huibing Cao
ALM
37
0
0
01 Aug 2023
TransNormerLLM: A Faster and Better Large Language Model with Improved
  TransNormer
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
Zhen Qin
Dong Li
Weigao Sun
Weixuan Sun
Xuyang Shen
...
Yunshen Wei
Baohong Lv
Xiao Luo
Yu Qiao
Yiran Zhong
85
18
0
27 Jul 2023
Multilingual Code Co-Evolution Using Large Language Models
Multilingual Code Co-Evolution Using Large Language Models
Jiyang Zhang
Pengyu Nie
Junyi Jessy Li
Miloš Gligorić
81
25
0
27 Jul 2023
Global k-Space Interpolation for Dynamic MRI Reconstruction using Masked
  Image Modeling
Global k-Space Interpolation for Dynamic MRI Reconstruction using Masked Image Modeling
Jia Pan
Suprosanna Shit
Özgün Turgut
Wenqi Huang
Hongwei Bran Li
Nil Stolt Ansó
Thomas Kustner
Kerstin Hammernik
Daniel Rueckert
65
9
0
24 Jul 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELMALM
119
156
0
20 Jul 2023
FABRIC: Personalizing Diffusion Models with Iterative Feedback
FABRIC: Personalizing Diffusion Models with Iterative Feedback
Dimitri von Rütte
Elisabetta Fedele
Jonathan Thomm
Lukas Wolf
71
13
0
19 Jul 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
122
1,336
0
17 Jul 2023
Retentive Network: A Successor to Transformer for Large Language Models
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
187
347
0
17 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
  and Generation
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLMVGen
123
275
0
13 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
110
45
0
12 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
253
622
0
12 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
  Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
188
119
0
12 Jul 2023
ReLoRA: High-Rank Training Through Low-Rank Updates
ReLoRA: High-Rank Training Through Low-Rank Updates
Vladislav Lialin
Namrata Shivagunde
Sherin Muckatira
Anna Rumshisky
BDL
101
117
0
11 Jul 2023
Large Language Models as General Pattern Machines
Large Language Models as General Pattern Machines
Suvir Mirchandani
F. Xia
Peter R. Florence
Brian Ichter
Danny Driess
Montse Gonzalez Arenas
Kanishka Rao
Dorsa Sadigh
Andy Zeng
LLMAG
133
201
0
10 Jul 2023
Large Language Models as Batteries-Included Zero-Shot ESCO Skills
  Matchers
Large Language Models as Batteries-Included Zero-Shot ESCO Skills Matchers
Benjamin Clavié
Guillaume Soulié
93
13
0
07 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Lost in the Middle: How Language Models Use Long Contexts
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Percy Liang
RALM
132
1,660
0
06 Jul 2023
Focused Transformer: Contrastive Training for Context Scaling
Focused Transformer: Contrastive Training for Context Scaling
Szymon Tworkowski
Konrad Staniszewski
Mikolaj Pacek
Yuhuai Wu
Henryk Michalewski
Piotr Milo's
76
141
0
06 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
114
165
0
05 Jul 2023
SMILE: Evaluation and Domain Adaptation for Social Media Language
  Understanding
SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
Vasilisa Bashlovkina
Riley Matthews
Zhaobin Kuang
Simon Baumgartner
Michael Bendersky
66
5
0
30 Jun 2023
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark
Federico Berto
Chuanbo Hua
J. Park
Laurin Luttmann
Yining Ma
...
Guojie Song
Changhyun Kwon
Kevin Tierney
Lin Xie
Jinkyoo Park
OffRL
114
29
0
29 Jun 2023
FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention
FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention
Ahan Gupta
Hao Guo
Yueming Yuan
Yan-Quan Zhou
Charith Mendis
71
3
0
27 Jun 2023
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
  Resolution
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
Eric N. D. Nguyen
Michael Poli
Marjan Faizi
A. Thomas
Callum Birch-Sykes
...
Stefano Massaroli
Yoshua Bengio
Stefano Ermon
S. Baccus
Christopher Ré
MedIm
98
259
0
27 Jun 2023
Extending Context Window of Large Language Models via Positional
  Interpolation
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian
167
543
0
27 Jun 2023
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
Weiming Zhuang
Chen Chen
Lingjuan Lyu
Chong Chen
Yaochu Jin
Lingjuan Lyu
AIFinAI4CE
223
98
0
27 Jun 2023
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species
  Genome
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Zhihan Zhou
Yanrong Ji
Weijian Li
Pratik Dutta
R. Davuluri
Han Liu
100
195
0
26 Jun 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
173
314
0
24 Jun 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language
  Models
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Neel Jain
Khalid Saifullah
Yuxin Wen
John Kirchenbauer
Manli Shu
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
ALMELM
102
23
0
23 Jun 2023
LightGlue: Local Feature Matching at Light Speed
LightGlue: Local Feature Matching at Light Speed
Philipp Lindenberger
Paul-Edouard Sarlin
Marc Pollefeys
3DVVLM
132
451
0
23 Jun 2023
Previous
123...2728293031
Next