ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,439 papers shown
Title
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Vithursan Thangarasa
Mahmoud Salem
Shreyas Saxena
Kevin Leong
Joel Hestness
Sean Lie
MedIm
40
1
0
01 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
63
117
0
29 Feb 2024
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy,
  Advances, and Outlook
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook
Xingchen Zou
Yibo Yan
Xixuan Hao
Yuehong Hu
Haomin Wen
...
Junbo Zhang
Yong Li
Tianrui Li
Yu Zheng
Keli Zhang
HAI
AI4TS
57
37
0
29 Feb 2024
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Rodrigo Santos
Joao Silva
Luís Gomes
João Rodrigues
António Branco
48
10
0
29 Feb 2024
CLLMs: Consistency Large Language Models
CLLMs: Consistency Large Language Models
Siqi Kou
Lanxiang Hu
Zhe He
Zhijie Deng
Hao Zhang
52
28
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
Stable LM 2 1.6B Technical Report
Stable LM 2 1.6B Technical Report
Marco Bellagente
J. Tow
Dakota Mahan
Duy Phung
Maksym Zhuravinskyi
...
Paulo Rocha
Harry Saini
H. Teufel
Niccoló Zanichelli
Carlos Riquelme
OSLM
54
52
0
27 Feb 2024
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi
Runpei Dong
Shaochen Zhang
Haoran Geng
Chunrui Han
Zheng Ge
Li Yi
Kaisheng Ma
49
52
0
27 Feb 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
Evaluating Very Long-Term Conversational Memory of LLM Agents
A. Maharana
Dong-Ho Lee
Sergey Tulyakov
Mohit Bansal
Francesco Barbieri
Yuwei Fang
LLMAG
29
68
0
27 Feb 2024
AmbigNLG: Addressing Task Ambiguity in Instruction for NLG
AmbigNLG: Addressing Task Ambiguity in Instruction for NLG
Ayana Niwa
Hayate Iso
38
4
0
27 Feb 2024
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain
  Question Answering
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering
Yuhao Wang
Ruiyang Ren
Junyi Li
Wayne Xin Zhao
Jing Liu
Ji-Rong Wen
RALM
45
9
0
27 Feb 2024
Training-Free Long-Context Scaling of Large Language Models
Training-Free Long-Context Scaling of Large Language Models
Chen An
Fei Huang
Jun Zhang
Shansan Gong
Xipeng Qiu
Chang Zhou
Lingpeng Kong
ALM
LRM
45
35
0
27 Feb 2024
Actions Speak Louder than Words: Trillion-Parameter Sequential
  Transducers for Generative Recommendations
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Jiaqi Zhai
Lucy Liao
Xing Liu
Yueming Wang
Rui Li
...
Zhaojie Gong
Fangda Gu
Michael He
Yin-Hua Lu
Yu Shi
OffRL
34
50
0
27 Feb 2024
Investigating the Effectiveness of HyperTuning via Gisting
Investigating the Effectiveness of HyperTuning via Gisting
Jason Phang
51
0
0
26 Feb 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
40
43
0
26 Feb 2024
Look Before You Leap: Towards Decision-Aware and Generalizable
  Tool-Usage for Large Language Models
Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models
Anchun Gui
Jian Li
Yong Dai
Nan Du
Han Xiao
41
1
0
26 Feb 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
61
82
0
26 Feb 2024
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
Han Zou
Qiyang Zhao
Lina Bariah
Yu Tian
M. Bennis
S. Lasaulce
101
12
0
26 Feb 2024
Seamless Human Motion Composition with Blended Positional Encodings
Seamless Human Motion Composition with Blended Positional Encodings
Germán Barquero
Sergio Escalera
Cristina Palmero
DiffM
56
30
0
23 Feb 2024
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and
  Two-Phase Partition
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Lu Ye
Ze Tao
Yong Huang
Yang Li
34
26
0
23 Feb 2024
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao
Sizhe Dang
Haishan Ye
Guang Dai
Yi Qian
Ivor W.Tsang
68
8
0
23 Feb 2024
MobileLLM: Optimizing Sub-billion Parameter Language Models for
  On-Device Use Cases
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Zechun Liu
Changsheng Zhao
Forrest N. Iandola
Chen Lai
Yuandong Tian
...
Ernie Chang
Yangyang Shi
Raghuraman Krishnamoorthi
Liangzhen Lai
Vikas Chandra
ALM
46
78
0
22 Feb 2024
RelayAttention for Efficient Large Language Model Serving with Long
  System Prompts
RelayAttention for Efficient Large Language Model Serving with Long System Prompts
Lei Zhu
Xinjiang Wang
Wayne Zhang
Rynson W. H. Lau
35
6
0
22 Feb 2024
Improving Language Understanding from Screenshots
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
43
10
0
21 Feb 2024
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Shanchuan Lin
Anran Wang
Xiao Yang
42
119
0
21 Feb 2024
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens
∞\infty∞Bench: Extending Long Context Evaluation Beyond 100K Tokens
Xinrong Zhang
Yingfa Chen
Shengding Hu
Zihang Xu
Junhao Chen
...
Xu Han
Zhen Leng Thai
Shuo Wang
Zhiyuan Liu
Maosong Sun
RALM
LRM
50
154
0
21 Feb 2024
ToDo: Token Downsampling for Efficient Generation of High-Resolution
  Images
ToDo: Token Downsampling for Efficient Generation of High-Resolution Images
Ethan Smith
Nayan Saxena
Aninda Saha
DiffM
40
5
0
21 Feb 2024
CAMELoT: Towards Large Language Models with Training-Free Consolidated
  Associative Memory
CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory
Zexue He
Leonid Karlinsky
Donghyun Kim
Julian McAuley
Dmitry Krotov
Rogerio Feris
KELM
RALM
43
10
0
21 Feb 2024
How do Hyenas deal with Human Speech? Speech Recognition and Translation
  with ConfHyena
How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
48
1
0
20 Feb 2024
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for
  Single or Sparse-view 3D Object Reconstruction
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Shitao Tang
Jiacheng Chen
Dilin Wang
Chengzhou Tang
Fuyang Zhang
Yuchen Fan
Vikas Chandra
Yasutaka Furukawa
Rakesh Ranjan
43
67
0
20 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
53
17
0
20 Feb 2024
Locality-Sensitive Hashing-Based Efficient Point Transformer with
  Applications in High-Energy Physics
Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics
Siqi Miao
Zhiyuan Lu
Mia Liu
Javier Duarte
Pan Li
66
4
0
19 Feb 2024
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Zhuoming Chen
Avner May
Ruslan Svirschevski
Yuhsun Huang
Max Ryabinin
Zhihao Jia
Beidi Chen
53
41
0
19 Feb 2024
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference
  Dataset and Modular Fine-tuning Schema
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
Junru Lu
Siyu An
Min Zhang
Yulan He
Di Yin
Xing Sun
60
2
0
19 Feb 2024
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for
  Ultra-Low-Parameter Fine-Tuning of Large Language Models
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
Yifan Yang
Jiajun Zhou
Ngai Wong
Zheng Zhang
31
7
0
18 Feb 2024
Language Models as Science Tutors
Language Models as Science Tutors
Alexis Chevalier
Jiayi Geng
Alexander Wettig
Howard Chen
Sebastian Mizera
...
Jiatong Yu
Jun-Jie Zhu
Z. Ren
Sanjeev Arora
Danqi Chen
ELM
35
11
0
16 Feb 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM
  Instruction-Tuning
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Jiuxiang Gu
Dinesh Manocha
34
53
0
15 Feb 2024
Multi-word Tokenization for Sequence Compression
Multi-word Tokenization for Sequence Compression
Leonidas Gee
Leonardo Rigutini
Marco Ernandes
Andrea Zugarini
18
8
0
15 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
46
48
0
15 Feb 2024
InstructGraph: Boosting Large Language Models via Graph-centric
  Instruction Tuning and Preference Alignment
InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment
Jianing Wang
Junda Wu
Yupeng Hou
Yao Liu
Ming Gao
Julian McAuley
35
32
0
13 Feb 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
44
64
0
13 Feb 2024
FAST: Factorizable Attention for Speeding up Transformers
FAST: Factorizable Attention for Speeding up Transformers
Armin Gerami
Monte Hoover
P. S. Dulepet
R. Duraiswami
35
0
0
12 Feb 2024
Suppressing Pink Elephants with Direct Principle Feedback
Suppressing Pink Elephants with Direct Principle Feedback
Louis Castricato
Nathan Lile
Suraj Anand
Hailey Schoelkopf
Siddharth Verma
Stella Biderman
71
10
0
12 Feb 2024
Anchor-based Large Language Models
Anchor-based Large Language Models
Jianhui Pang
Fanghua Ye
Derek F. Wong
Xin He
Wanshun Chen
Longyue Wang
KELM
61
8
0
12 Feb 2024
The I/O Complexity of Attention, or How Optimal is Flash Attention?
The I/O Complexity of Attention, or How Optimal is Flash Attention?
Barna Saha
Christopher Ye
32
5
0
12 Feb 2024
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning
  Framework for Dialogue
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue
Jian Wang
Chak Tou Leong
Jiashuo Wang
Dongding Lin
Wenjie Li
Xiao-Yong Wei
50
7
0
10 Feb 2024
On the Efficacy of Eviction Policy for Key-Value Constrained Generative
  Language Model Inference
On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference
Siyu Ren
Kenny Q. Zhu
31
27
0
09 Feb 2024
ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics
ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics
Liangyu Zhao
Saeed Maleki
Ziyue Yang
Hossein Pourreza
Aashaka Shah
44
0
0
09 Feb 2024
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù
Zdeněk Kasner
Siva Reddy
39
62
0
08 Feb 2024
Memory Consolidation Enables Long-Context Video Understanding
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
108
25
0
08 Feb 2024
Previous
123...181920...272829
Next