Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
v1
v2 (latest)
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,508 papers shown
Title
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan
Zhuoming Chen
Haizhong Zheng
Yang Zhou
Emma Strubell
Beidi Chen
103
0
0
05 Jun 2025
Log-Linear Attention
Han Guo
Songlin Yang
Tarushii Goel
Eric P. Xing
Tri Dao
Yoon Kim
Mamba
158
1
0
05 Jun 2025
AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
Yifeng Gu
Zicong Jiang
Jianxiu Jin
K. Guo
Ziyang Zhang
Xiangmin Xu
101
0
0
04 Jun 2025
TokAlign: Efficient Vocabulary Adaptation via Token Alignment
Chong Li
Jiajun Zhang
Chengqing Zong
VLM
53
0
0
04 Jun 2025
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
Hermann Kumbong
Xian Liu
Tsung-Yi Lin
Ming-Yu Liu
Xihui Liu
Ziwei Liu
Daniel Y. Fu
Christopher Ré
David W. Romero
DiffM
48
0
0
04 Jun 2025
Homogeneous Keys, Heterogeneous Values: Exploiting Local KV Cache Asymmetry for Long-Context LLMs
Wanyun Cui
Mingwei Xu
15
0
0
04 Jun 2025
Video, How Do Your Tokens Merge?
Sam Pollard
Michael Wray
ViT
MoMe
69
0
0
04 Jun 2025
Comba: Improving Bilinear RNNs with Closed-loop Control
Jiaxi Hu
Yongqi Pan
Jusen Du
Disen Lan
Xiaqiang Tang
Qingsong Wen
Yuxuan Liang
Weigao Sun
74
0
0
03 Jun 2025
Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization
Ruilong Wu
Xinjiao Li
Yisu Wang
Xinyu Chen
Dirk Kutscher
54
0
0
03 Jun 2025
QKV Projections Require a Fraction of Their Memory
Malik Khalf
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQ
VLM
56
0
0
03 Jun 2025
HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference
Ping Gong
Jiawei Yi
Shengnan Wang
Juncheng Zhang
Zewen Jin
...
Tong Yang
Gong Zhang
Renhai Chen
Feng Wu
Cheng Li
50
0
0
03 Jun 2025
Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas
Austin Silveria
Soham V. Govande
Daniel Y. Fu
15
0
0
03 Jun 2025
Contrast & Compress: Learning Lightweight Embeddings for Short Trajectories
Abhishek Vivekanandan
Christian Hubschneider
J. M. Zöllner
43
0
0
03 Jun 2025
Leveraging Natural Language Processing to Unravel the Mystery of Life: A Review of NLP Approaches in Genomics, Transcriptomics, and Proteomics
Ella Rannon
David Burstein
AI4TS
19
0
0
02 Jun 2025
Generic Token Compression in Multimodal Large Language Models from an Explainability Perspective
Lei Lei
Jie Gu
Xiaokang Ma
Chu Tang
Jingmin Chen
Tong Xu
30
1
0
01 Jun 2025
Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation
Muhammad Adnan
Nithesh Kurella
Akhil Arunkumar
Prashant J. Nair
DiffM
VGen
27
0
0
31 May 2025
Probabilistic Forecasting for Building Energy Systems using Time-Series Foundation Models
Young-Jin Park
François Germain
Jing Liu
Ye Wang
T. Koike-Akino
Gordon Wichern
Navid Azizan
C. Laughman
Ankush Chakrabarty
AI4TS
AI4CE
30
0
0
31 May 2025
SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling
Xiaodong Ji
Hailin Zhang
Fangcheng Fu
Bin Cui
17
0
0
30 May 2025
50 Years of Automated Face Recognition
Minchul Kim
Anil K. Jain
Xiaoming Liu
68
0
0
30 May 2025
AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity
Yu Zhang
Dong Guo
Fang Wu
Guoliang Zhu
Dian Ding
Yiming Zhang
72
1
0
29 May 2025
Actor-Critic based Online Data Mixing For Language Model Pre-Training
Jing Ma
Chenhao Dang
Mingjie Liao
23
0
0
29 May 2025
LoLA: Low-Rank Linear Attention With Sparse Caching
Luke McDermott
Robert W. Heath Jr.
Rahul Parhi
RALM
53
0
0
29 May 2025
Large Language Model Meets Constraint Propagation
Alexandre Bonlarron
Florian Régin
Elisabetta De Maria
Jean-Charles Régin
35
0
0
29 May 2025
Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models
Lang Cao
Jingxian Xu
Hanbing Liu
Jinyu Wang
Mengyu Zhou
Haoyu Dong
Shi Han
Dongmei Zhang
LRM
OffRL
LMTD
ReLM
58
0
0
29 May 2025
Advancing Expert Specialization for Better MoE
Hongcan Guo
Haolang Lu
Guoshun Nan
Bolun Chu
Jialin Zhuang
Yuan Yang
Wenhao Che
Sicong Leng
Qimei Cui
Xudong Jiang
MoE
MoMe
84
0
0
28 May 2025
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
Aniruddha Nrusimha
William Brandon
Mayank Mishra
Yikang Shen
Rameswar Panda
Jonathan Ragan-Kelley
Yoon Kim
VLM
17
0
0
28 May 2025
Towards Scalable Language-Image Pre-training for 3D Medical Imaging
Chenhui Zhao
Yiwei Lyu
Asadur Chowdury
Edward Harake
A. Kondepudi
Akshay Rao
X. Hou
Honglak Lee
Todd C. Hollon
LM&MA
MedIm
39
0
0
28 May 2025
Learning in Compact Spaces with Approximately Normalized Transformers
Jörg Franke
Urs Spiegelhalter
Marianna Nezhurina
J. Jitsev
Frank Hutter
Michael Hefenbrock
57
0
0
28 May 2025
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse
Tianyu Guo
Hande Dong
Yichong Leng
Feng Liu
Cheater Lin
Nong Xiao
X. Zhang
RALM
15
0
0
28 May 2025
The quest for the GRAph Level autoEncoder (GRALE)
Paul Krzakala
Gabriel Melo
Charlotte Laclau
Florence dÁlché-Buc
Rémi Flamary
48
0
0
28 May 2025
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
Ce Zhang
Kaixin Ma
Tianqing Fang
Wenhao Yu
Hongming Zhang
Zhisong Zhang
Yaqi Xie
Katia Sycara
Haitao Mi
Dong Yu
VLM
96
0
0
28 May 2025
Geometric Hyena Networks for Large-scale Equivariant Learning
Artem Moskalev
Mangal Prakash
Junjie Xu
Tianyu Cui
Rui Liao
Tommaso Mansi
45
1
0
28 May 2025
Curse of High Dimensionality Issue in Transformer for Long-context Modeling
Shuhai Zhang
Zeng You
Yaofo Chen
Z. Wen
Qianyue Wang
Zhijie Qiu
Yuanqing Li
Mingkui Tan
40
0
0
28 May 2025
In Search of Adam's Secret Sauce
Antonio Orvieto
Robert Gower
25
1
0
27 May 2025
Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations
Yue Li Du
Ben Alexander
Mikhail Antonenka
Rohan Mahadev
Hao Wu
Dmitry Kislyuk
36
0
0
27 May 2025
Hardware-Efficient Attention for Fast Decoding
Ted Zadouri
Hubert Strauss
Tri Dao
66
2
0
27 May 2025
efunc: An Efficient Function Representation without Neural Networks
Biao Zhang
Peter Wonka
14
0
0
27 May 2025
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
Jungyoub Cha
Hyunjong Kim
Sungzoon Cho
VLM
72
0
0
27 May 2025
SageAttention2++: A More Efficient Implementation of SageAttention2
Jintao Zhang
Xiaoming Xu
Jia Wei
Haofeng Huang
Pengle Zhang
Chendong Xiang
Jun Zhu
Jianfei Chen
MQ
VLM
83
7
0
27 May 2025
Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers
Yukun Zhang
Xueqing Zhou
AI4TS
33
0
0
27 May 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao
Keda Tao
Can Qin
Haoxuan You
Yang Sui
Huan Wang
VLM
58
0
0
27 May 2025
Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling
Hongtao Xu
Wenting Shen
Yuanxin Wei
Ang Wang
Guo Runfan
Tianxing Wang
Yong Li
Mingzhen Li
Weile Jia
26
0
0
26 May 2025
Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation
Tanjil Hasan Sakib
Md. Tanzib Hosain
Md. Kishor Morol
ALM
36
0
0
26 May 2025
Understanding Transformer from the Perspective of Associative Memory
Shu Zhong
Mingyu Xu
Tenglong Ao
Guang Shi
47
1
0
26 May 2025
Lorentz Local Canonicalization: How to Make Any Network Lorentz-Equivariant
Jonas Spinner
Luigi Favaro
Peter Lippmann
Sebastian Pitz
Gerrit Gerhartz
Tilman Plehn
Fred Hamprecht
AI4CE
25
1
0
26 May 2025
Large Language Models as Autonomous Spacecraft Operators in Kerbal Space Program
Alejandro Carrasco
Victor Rodríguez-Fernández
Richard Linares
LLMAG
15
0
0
26 May 2025
Estimating Online Influence Needs Causal Modeling! Counterfactual Analysis of Social Media Engagement
Lin Tian
Marian-Andrei Rizoiu
CML
24
0
0
25 May 2025
Jodi: Unification of Visual Generation and Understanding via Joint Modeling
Yifeng Xu
Zhenliang He
Meina Kan
Shiguang Shan
Xilin Chen
VLM
83
0
0
25 May 2025
100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?
Wang Yang
Hongye Jin
Shaochen Zhong
Song Jiang
Qifan Wang
Vipin Chaudhary
Xiaotian Han
ELM
44
0
0
25 May 2025
MTGR: Industrial-Scale Generative Recommendation Framework in Meituan
Ruidong Han
Bin Yin
S. Chen
He Jiang
F. Jiang
...
Yueming Han
M. Zhou
Lei Yu
Chuan Liu
Wei Lin
LRM
31
1
0
24 May 2025
Previous
1
2
3
4
5
...
29
30
31
Next