ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
A Full-History Network Dataset for BTC Asset Decentralization Profiling
A Full-History Network Dataset for BTC Asset Decentralization Profiling
Ling Cheng
Qian Shao
Fengzhu Zeng
Feida Zhu
71
0
0
19 Nov 2024
Direct and Explicit 3D Generation from a Single Image
Haoyu Wu
Meher Gitika Karumuri
Chuhang Zou
Seungbae Bang
Yuelong Li
Dimitris Samaras
Sunil Hadap
3DGSMDE3DV
101
2
0
17 Nov 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jian Wu
Bo Xu
Guoqi Li
93
7
0
16 Nov 2024
IntentGPT: Few-shot Intent Discovery with Large Language Models
IntentGPT: Few-shot Intent Discovery with Large Language Models
Juan A. Rodriguez
Nicholas Botzer
David Vazquez
Christopher Pal
M. Pedersoli
I. Laradji
VLM
133
3
0
16 Nov 2024
Hysteresis Activation Function for Efficient Inference
Hysteresis Activation Function for Efficient Inference
Moshe Kimhi
Idan Kashani
A. Mendelson
Chaim Baskin
LLMSV
115
0
0
15 Nov 2024
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee
Jiwoong Park
Jinseok Kim
Yongjik Kim
Jungju Oh
Jinwook Oh
Jungwook Choi
73
2
0
15 Nov 2024
A System Level Performance Evaluation for Superconducting Digital
  Systems
A System Level Performance Evaluation for Superconducting Digital Systems
Joyjit Kundu
Debjyoti Bhattacharjee
Nathan Josephsen
Ankit Pokhrel
Udara De Silva
...
Steven Van Winckel
Steven Brebels
Manu Perumkunnil
Quentin Herr
Anna Herr
32
0
0
13 Nov 2024
Artificial Intelligence for Biomedical Video Generation
Artificial Intelligence for Biomedical Video Generation
Linyuan Li
Jianing Qiu
Anujit Saha
Lin Li
Poyuan Li
Mengxian He
Ziyu Guo
Wu Yuan
VGen
175
0
0
12 Nov 2024
Spiking Transformer Hardware Accelerators in 3D Integration
Spiking Transformer Hardware Accelerators in 3D Integration
Boxun Xu
Junyoung Hwang
Pruek Vanna-iampikul
Sung Kyu Lim
Peng Li
56
2
0
11 Nov 2024
MEANT: Multimodal Encoder for Antecedent Information
MEANT: Multimodal Encoder for Antecedent Information
Benjamin Iyoya Irving
Annika Marie Schoene
AIFin
56
0
1
10 Nov 2024
Accelerating Large Language Model Training with 4D Parallelism and
  Memory Consumption Estimator
Accelerating Large Language Model Training with 4D Parallelism and Memory Consumption Estimator
Kazuki Fujii
Kohei Watanabe
Rio Yokota
86
2
0
10 Nov 2024
SSSD: Simply-Scalable Speculative Decoding
SSSD: Simply-Scalable Speculative Decoding
Michele Marzollo
Jiawei Zhuang
Niklas Roemer
Lorenz K. Müller
Lukas Cavigelli
LRM
71
2
0
08 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
83
14
0
07 Nov 2024
Retentive Neural Quantum States: Efficient Ansätze for Ab Initio
  Quantum Chemistry
Retentive Neural Quantum States: Efficient Ansätze for Ab Initio Quantum Chemistry
Oliver Knitter
Dan Zhao
J. Stokes
M. Ganahl
Stefan Leichenauer
S. Veerapaneni
59
2
0
06 Nov 2024
The N-Grammys: Accelerating Autoregressive Inference with Learning-Free
  Batched Speculation
The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation
Lawrence Stewart
Matthew Trager
Sujan Kumar Gonugondla
Stefano Soatto
95
7
0
06 Nov 2024
Deploying Multi-task Online Server with Large Language Model
Deploying Multi-task Online Server with Large Language Model
Yincen Qu
Chao Ma
Xiangying Dai
Hui Zhou
Yiting Wu
Hengyue Liu
58
0
0
06 Nov 2024
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
Hao Phung
Quan Dao
T. Dao
Hoang Phan
Dimitris Metaxas
Anh Tran
Mamba
166
5
0
06 Nov 2024
GitChameleon: Unmasking the Version-Switching Capabilities of Code
  Generation Models
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
Nizar Islah
Justine Gehring
Diganta Misra
Eilif B. Muller
Irina Rish
Terry Yue Zhuo
Massimo Caccia
SyDa
54
1
0
05 Nov 2024
LASER: Attention with Exponential Transformation
LASER: Attention with Exponential Transformation
Sai Surya Duvvuri
Inderjit Dhillon
50
1
0
05 Nov 2024
LiVOS: Light Video Object Segmentation with Gated Linear Matching
LiVOS: Light Video Object Segmentation with Gated Linear Matching
Qin Liu
Jianfeng Wang
Zhiyong Yang
Linjie Li
Kevin Qinghong Lin
Marc Niethammer
Lijuan Wang
VOS
78
1
0
05 Nov 2024
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
Wei Wu
Zhuoshi Pan
Chao Wang
L. Chen
Y. Bai
Kun Fu
Zehua Wang
Hui Xiong
Hui Xiong
LLMAG
176
7
0
05 Nov 2024
PipeLLM: Fast and Confidential Large Language Model Services with
  Speculative Pipelined Encryption
PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption
Yifan Tan
Cheng Tan
Zeyu Mi
Haibo Chen
66
1
0
04 Nov 2024
Training Compute-Optimal Protein Language Models
Training Compute-Optimal Protein Language Models
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
123
17
0
04 Nov 2024
Code-Switching Curriculum Learning for Multilingual Transfer in LLMs
Code-Switching Curriculum Learning for Multilingual Transfer in LLMs
Haneul Yoo
Cheonbok Park
Sangdoo Yun
Alice Oh
Hwaran Lee
93
5
0
04 Nov 2024
Context Parallelism for Scalable Million-Token Inference
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoELRM
175
7
0
04 Nov 2024
Efficient Deep Learning Infrastructures for Embedded Computing Systems:
  A Comprehensive Survey and Future Envision
Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision
Xiangzhong Luo
Di Liu
Hao Kong
Shuo Huai
Hui Chen
Guochu Xiong
Weichen Liu
60
6
0
03 Nov 2024
Efficient Sparse Training with Structured Dropout
Efficient Sparse Training with Structured Dropout
Andy Lo
BDL
63
0
0
02 Nov 2024
Transfer Learning for Finetuning Large Language Models
Transfer Learning for Finetuning Large Language Models
Tobias Strangmann
Lennart Purucker
Jörg Franke
Ivo Rapant
Fabio Ferreira
Frank Hutter
110
0
0
02 Nov 2024
Normalization Layer Per-Example Gradients are Sufficient to Predict
  Gradient Noise Scale in Transformers
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
Gavia Gray
Aman Tiwari
Shane Bergsma
Joel Hestness
85
2
0
01 Nov 2024
Optimizing Contextual Speech Recognition Using Vector Quantization for
  Efficient Retrieval
Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
Nikolaos Flemotomos
Roger Hsiao
P. Swietojanski
Takaaki Hori
Dogan Can
Xiaodan Zhuang
126
1
0
01 Nov 2024
A Lorentz-Equivariant Transformer for All of the LHC
A Lorentz-Equivariant Transformer for All of the LHC
Johann Brehmer
Victor Bresó
P. D. Haan
Tilman Plehn
Huilin Qu
Jonas Spinner
Jesse Thaler
BDL
106
17
0
01 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With
  SSMs
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
72
1
0
31 Oct 2024
ALISE: Accelerating Large Language Model Serving with Speculative
  Scheduling
ALISE: Accelerating Large Language Model Serving with Speculative Scheduling
Youpeng Zhao
Jun Wang
69
0
0
31 Oct 2024
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
100Kor100Days:Trade−offswhenPre−TrainingwithAcademicResources100K or 100 Days: Trade-offs when Pre-Training with Academic Resources100Kor100Days:Trade−offswhenPre−TrainingwithAcademicResources
Apoorv Khandelwal
Tian Yun
Nihal V. Nayak
Jack Merullo
Stephen H. Bach
Chen Sun
Ellie Pavlick
VLMAI4CEOnRL
104
2
0
30 Oct 2024
Does equivariance matter at scale?
Does equivariance matter at scale?
Johann Brehmer
S. Behrends
P. D. Haan
Taco S. Cohen
106
15
0
30 Oct 2024
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters
  for Efficient LLM Inference
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference
Junqi Zhao
Zhijin Fang
Shu Li
Shaohui Yang
Shichao He
67
3
0
30 Oct 2024
FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with
  Arbitrary Resolution
FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution
Shuai Wang
Zexian Li
Tianhui Song
Xubin Li
Tiezheng Ge
Bo Zheng
Liwen Wang
105
3
0
30 Oct 2024
Towards Unifying Understanding and Generation in the Era of Vision
  Foundation Models: A Survey from the Autoregression Perspective
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
151
3
0
29 Oct 2024
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting
Sunghwan Hong
Jaewoo Jung
Heeseong Shin
Jisang Han
Jiaolong Yang
Chong Luo
Seungryong Kim
3DGS
81
12
0
29 Oct 2024
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
127
6
0
29 Oct 2024
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback
Nour Jedidi
Yung-Sung Chuang
Leslie Shing
James R. Glass
RALM
64
1
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
137
7
0
28 Oct 2024
GenUP: Generative User Profilers as In-Context Learners for Next POI Recommender Systems
GenUP: Generative User Profilers as In-Context Learners for Next POI Recommender Systems
Wilson Wongso
Hao Xue
Flora D. Salim
59
2
0
28 Oct 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Yiyuan Ma
Wenlei Bao
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
165
21
0
28 Oct 2024
Fast Best-of-N Decoding via Speculative Rejection
Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun
Momin Haider
Ruiqi Zhang
Huitao Yang
Jiahao Qiu
Ming Yin
Mengdi Wang
Peter L. Bartlett
Andrea Zanette
BDL
112
52
0
26 Oct 2024
MatExpert: Decomposing Materials Discovery by Mimicking Human Experts
MatExpert: Decomposing Materials Discovery by Mimicking Human Experts
Qianggang Ding
Santiago Miret
Bang Liu
MoE
69
8
0
26 Oct 2024
Understanding Adam Requires Better Rotation Dependent Assumptions
Understanding Adam Requires Better Rotation Dependent Assumptions
Lucas Maes
Tianyue H. Zhang
Alexia Jolicoeur-Martineau
Ioannis Mitliagkas
Damien Scieur
Simon Lacoste-Julien
Charles Guille-Escuret
70
3
0
25 Oct 2024
Two are better than one: Context window extension with multi-grained
  self-injection
Two are better than one: Context window extension with multi-grained self-injection
Wei Han
Pan Zhou
Soujanya Poria
Shuicheng Yan
70
0
0
25 Oct 2024
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in
  Low-Resource Code
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code
Jipeng Zhang
Jianshu Zhang
Yuanzhe Li
Renjie Pi
Boyao Wang
Runtao Liu
Ziqiang Zheng
Tong Zhang
62
0
0
24 Oct 2024
BATON: Enhancing Batch-wise Inference Efficiency for Large Language
  Models via Dynamic Re-batching
BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching
Peizhuang Cong
Qizhi Chen
Haochen Zhao
Tong Yang
KELM
82
2
0
24 Oct 2024
Previous
123...91011...293031
Next