ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
v1v2 (latest)

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,508 papers shown
Title
FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving
FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving
Yutao Zhu
Xiaosong Jia
Xinyu Yang
Junchi Yan
ViT
78
6
0
01 Jul 2025
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
Zhen Xu
Shang Zhu
Jue Wang
Junlin Wang
Ben Athiwaratkun
Chi Wang
James Zou
Ce Zhang
LLMAG
12
0
0
19 Jun 2025
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning
Haoyue Zhang
Hualei Zhang
Xiaosong Ma
Jie Zhang
Song Guo
LRM
17
0
0
19 Jun 2025
LBMamba: Locally Bi-directional Mamba
LBMamba: Locally Bi-directional Mamba
Jingwei Zhang
Xi Han
Hong Qin
Mahdi S. Hosseini
Dimitris Samaras
Mamba
30
0
0
19 Jun 2025
Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models
Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models
Daniel Fidel Harvey
George Weale
Berk Yilmaz
MoE
7
0
0
19 Jun 2025
REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing
REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing
Kangqi Chen
Andreas Kosmas Kakolyris
Rakesh Nadig
Manos Frouzakis
Nika Mansouri-Ghiasi
Yu Liang
Haiyu Mao
Jisung Park
Mohammad Sadrosadati
Onur Mutlu
RALM
36
0
0
19 Jun 2025
Zero-Shot Reinforcement Learning Under Partial Observability
Zero-Shot Reinforcement Learning Under Partial Observability
Scott Jeen
Tom Bewley
Jonathan M. Cullen
OffRL
20
0
0
18 Jun 2025
T-SHRED: Symbolic Regression for Regularization and Model Discovery with Transformer Shallow Recurrent Decoders
T-SHRED: Symbolic Regression for Regularization and Model Discovery with Transformer Shallow Recurrent Decoders
Alexey Yermakov
David Zoro
Mars Liyao Gao
J. Nathan Kutz
15
0
0
18 Jun 2025
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models
Jesmin Jahan Tithi
Hanjiang Wu
Avishaii Abuhatzera
Fabrizio Petrini
MoEALM
11
0
0
17 Jun 2025
Efficient Serving of LLM Applications with Probabilistic Demand Modeling
Efficient Serving of LLM Applications with Probabilistic Demand Modeling
Yifei Liu
Zuo Gan
Zhenghao Gan
Weiye Wang
Chen Chen
...
Xusheng Chen
Zhenhua Han
Yifei Zhu
Shixuan Sun
Minyi Guo
15
0
0
17 Jun 2025
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Runpeng Yu
Qi Li
Xinchao Wang
DiffMAI4CE
22
0
0
16 Jun 2025
StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns
StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns
Luanbo Wan
Weizhi Ma
LLMAGKELM
24
0
0
16 Jun 2025
Scaling Algorithm Distillation for Continuous Control with Mamba
Scaling Algorithm Distillation for Continuous Control with Mamba
Samuel Beaussant
Mehdi Mounsif
20
0
0
16 Jun 2025
Personalizable Long-Context Symbolic Music Infilling with MIDI-RWKV
Personalizable Long-Context Symbolic Music Infilling with MIDI-RWKV
Christian Zhou-Zheng
Philippe Pasquier
20
0
0
16 Jun 2025
AlphaEvolve: A coding agent for scientific and algorithmic discovery
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Alexander Novikov
Ngan Vu
Marvin Eisenberger
Emilien Dupont
Po-Sen Huang
...
George Holland
Alex Davies
Sebastian Nowozin
Pushmeet Kohli
Matej Balog
45
17
0
16 Jun 2025
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
Teodora Srećković
Jonas Geiping
Antonio Orvieto
MoE
24
0
0
14 Jun 2025
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
Qirui Zhou
Shaohui Peng
Weiqiang Xiong
Haixin Chen
Yuanbo Wen
...
Ke Gao
Ruizhi Chen
Yanjun Wu
Chen Zhao
Y. Chen
LRM
19
0
0
14 Jun 2025
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Asghar Ghorbani
Hanieh Fattahi
16
0
0
14 Jun 2025
LIFELONG SOTOPIA: Evaluating Social Intelligence of Language Agents Over Lifelong Social Interactions
LIFELONG SOTOPIA: Evaluating Social Intelligence of Language Agents Over Lifelong Social Interactions
Hitesh Goel
Hao Zhu
CLL
33
0
0
14 Jun 2025
Semantic Scheduling for LLM Inference
Semantic Scheduling for LLM Inference
Wenyue Hua
Dujian Ding
Yile Gu
Yujie Ren
Kai Mei
Minghua Ma
William Yang Wang
12
0
0
13 Jun 2025
Lag-Relative Sparse Attention In Long Context Training
Lag-Relative Sparse Attention In Long Context Training
Manlai Liang
Wanyi Huang
Mandi Liu
Huaijun Li
Jinlong Li
RALM
12
0
0
13 Jun 2025
The Effect of Stochasticity in Score-Based Diffusion Sampling: a KL Divergence Analysis
The Effect of Stochasticity in Score-Based Diffusion Sampling: a KL Divergence Analysis
Bernardo P. Schaeffer
Ricardo M. S. Rosa
Glauco Valle
DiffM
10
0
0
13 Jun 2025
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Hui Wei
Dong Yoon Lee
Shubham Rohal
Zhizhang Hu
Shiwei Fang
Shijia Pan
28
0
0
13 Jun 2025
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Hantao Yu
Josh Alman
26
0
0
13 Jun 2025
Multi-Timescale Dynamics Model Bayesian Optimization for Plasma Stabilization in Tokamaks
Multi-Timescale Dynamics Model Bayesian Optimization for Plasma Stabilization in Tokamaks
Rohit Sonker
Alexandre Capone
Andrew Rothstein
Hiro Josep Farre Kaga
E. Kolemen
J. Schneider
AI4CE
114
0
0
12 Jun 2025
PyLO: Towards Accessible Learned Optimizers in PyTorch
PyLO: Towards Accessible Learned Optimizers in PyTorch
Paul Janson
Benjamin Thérien
Quentin G. Anthony
Xiaolong Huang
A. Moudgil
Eugene Belilovsky
ODLAI4CE
132
0
0
12 Jun 2025
NoLoCo: No-all-reduce Low Communication Training Method for Large Models
NoLoCo: No-all-reduce Low Communication Training Method for Large Models
Jari Kolehmainen
Nikolay Blagoev
John Donaghy
Oğuzhan Ersoy
Christopher Nies
101
0
0
12 Jun 2025
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang
Mengzhen Liu
Lichen Li
Ming Lu
Yuan Zhang
Junwen Pan
Qi She
Shanghang Zhang
VLM
105
0
0
12 Jun 2025
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
Yeonju Ro
Zhenyu Zhang
Souvik Kundu
Zhangyang Wang
Aditya Akella
91
0
0
11 Jun 2025
Latent Multi-Head Attention for Small Language Models
Latent Multi-Head Attention for Small Language Models
Sushant Mehta
Raj Abhijit Dandekar
Rajat Dandekar
Sreedath Panat
RALM
41
0
0
11 Jun 2025
Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment
Scalable Non-Equivariant 3D Molecule Generation via Rotational Alignment
Yuhui Ding
Thomas Hofmann
DiffMBDL
68
0
0
11 Jun 2025
Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction
Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction
Ruben Weitzman
Peter Mørch Groth
Lood Van Niekerk
Aoi Otani
Y. Gal
D. Marks
Pascal Notin
27
0
0
10 Jun 2025
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma
Longhui Wei
Feng Wang
Shiliang Zhang
Q. Tian
33
0
0
10 Jun 2025
Brevity is the soul of sustainability: Characterizing LLM response lengths
S. Poddar
Paramita Koley
Janardan Misra
Sanjay Podder
Navveen Balani
Niloy Ganguly
Saptarshi Ghosh
25
0
0
10 Jun 2025
Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs
Samarth Sikand
Rohit Mehra
Priyavanshi Pathania
Nikhil Bamby
Vibhu Saujanya Sharma
Vikrant Kaulgud
Sanjay Podder
Adam P. Burden
15
0
0
10 Jun 2025
TTrace: Lightweight Error Checking and Diagnosis for Distributed Training
Haitian Jiang
Shaowei Zhu
Zhen Zhang
Zhenyu Song
Xinwei Fu
Zhen Jia
Yida Wang
Jinyang Li
34
0
0
10 Jun 2025
Plug-and-Play Linear Attention for Pre-trained Image and Video Restoration Models
Srinivasan Kidambi
Pravin Nair
26
0
0
10 Jun 2025
JAFAR: Jack up Any Feature at Any Resolution
JAFAR: Jack up Any Feature at Any Resolution
Paul Couairon
Loick Chambon
Louis Serrano
Jean-Emmanuel Haugeard
Matthieu Cord
Nicolas Thome
MDE
37
0
0
10 Jun 2025
Quantifying Mix Network Privacy Erosion with Generative Models
Vasilios Mavroudis
Tariq Elahi
23
0
0
10 Jun 2025
Beyond Benchmarks: A Novel Framework for Domain-Specific LLM Evaluation and Knowledge Mapping
Beyond Benchmarks: A Novel Framework for Domain-Specific LLM Evaluation and Knowledge Mapping
Nitin Sharma
Thomas Wolfers
Çağatay Yıldız
ALM
17
0
0
09 Jun 2025
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
Andrew Z. Wang
Songwei Ge
Tero Karras
Ming-Yu Liu
Yogesh Balaji
28
0
0
09 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
29
0
0
09 Jun 2025
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts
Wei Tao
Haocheng Lu
Xiaoyang Qu
Bin Zhang
Kai Lu
Jiguang Wan
Jianzong Wang
MQMoE
15
0
0
09 Jun 2025
Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference
Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference
Thomas Joshi
Herman Saini
Neil Dhillon
Antoni Viros i Martin
Kaoutar El Maghraoui
21
0
0
08 Jun 2025
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
Pengfei Zhao
Rongbo Luan
Wei Zhang
Peng Wu
Sifeng He
23
0
0
08 Jun 2025
RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints
RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints
Tan-Hanh Pham
Chris Ngo
OffRLLRM
23
0
0
07 Jun 2025
Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks
Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks
Yuxuan Jiang
Ziming Zhou
Boyu Xu
Beijie Liu
Runhui Xu
Peng Huang
17
0
0
06 Jun 2025
Log-Linear Attention
Log-Linear Attention
Han Guo
Songlin Yang
Tarushii Goel
Eric P. Xing
Tri Dao
Yoon Kim
Mamba
158
1
0
05 Jun 2025
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu
Zeyu Zhang
Zhexin Li
Xuehai Bai
Yizeng Han
...
Jiahao He
Yuanyu He
F. Wang
Gholamreza Haffari
Bohan Zhuang
VGenMQ
141
1
0
05 Jun 2025
Kinetics: Rethinking Test-Time Scaling Laws
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan
Zhuoming Chen
Haizhong Zheng
Yang Zhou
Emma Strubell
Beidi Chen
103
0
0
05 Jun 2025
1234...293031
Next