Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,427 papers shown
Title
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
37
0
0
15 May 2025
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Ke Wang
Junting Pan
Linda Wei
Aojun Zhou
Weikang Shi
...
Han Xiao
Y. Yang
Houxing Ren
Mingjie Zhan
Hongsheng Li
29
0
0
15 May 2025
Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios
Huafeng Shi
Jianzhong Liang
Rongchang Xie
Xian Wu
Cheng Chen
Chang Liu
VGen
17
0
0
14 May 2025
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yuxiang Wang
Yuxuan Liu
Y. X. Wei
MoE
36
0
0
14 May 2025
Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing
Chen Wu
Yin Song
MoE
LRM
34
0
0
13 May 2025
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain
Wenzhen Yue
Y. Liu
Haoxuan Li
Hao Wang
Xianghua Ying
Ruohao Guo
Bowei Xing
Ji Shi
AI4TS
OOD
31
0
0
12 May 2025
Fused3S: Fast Sparse Attention on Tensor Cores
Zitong Li
Aparna Chandramowlishwaran
GNN
47
0
0
12 May 2025
Putting It All into Context: Simplifying Agents with LCLMs
Mingjian Jiang
Yangjun Ruan
Luis A. Lastras
Pavan Kapanipathi
Tatsunori Hashimoto
LLMAG
31
0
0
12 May 2025
Attention Is Not All You Need: The Importance of Feedforward Networks in Transformer Models
Isaac Gerber
29
0
0
10 May 2025
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
HamidReza Imani
Jiaxin Peng
Peiman Mohseni
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
31
0
0
10 May 2025
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
Seunghee Han
S. Choi
J. Kim
26
0
0
09 May 2025
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
Haolin Zhang
Jeff Huang
30
0
0
09 May 2025
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation
Jiahao Li
Weijian Ma
Xueyang Li
Yunzhong Lou
G. Zhou
Xiangdong Zhou
34
0
0
07 May 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang
Y. Wang
Chaoren Wang
Z. Li
Zhuo Chen
Zhizheng Wu
135
0
0
07 May 2025
MFSeg: Efficient Multi-frame 3D Semantic Segmentation
Chengjie Huang
Krzysztof Czarnecki
3DPC
44
0
0
07 May 2025
Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Xueguang Ma
Luyu Gao
Shengyao Zhuang
Jiaqi Samantha Zhan
Jamie Callan
Jimmy Lin
142
0
0
05 May 2025
SCFormer: Structured Channel-wise Transformer with Cumulative Historical State for Multivariate Time Series Forecasting
Shiwei Guo
Z. Chen
Yupeng Ma
Yunfei Han
Yi Wang
AI4TS
145
0
0
05 May 2025
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Xingyu Zhou
Wei Long
Jingbo Lu
Shiyin Jiang
Weiyi You
Haifeng Wu
Shuhang Gu
41
0
0
04 May 2025
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
ziqi wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
128
0
0
02 May 2025
PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding
Bradley McDanel
S. Zhang
Y. Hu
Zining Liu
MoE
116
0
0
02 May 2025
BiGSCoder: State Space Model for Code Understanding
Shweta Verma
Abhinav Anand
Mira Mezini
Mamba
46
0
0
02 May 2025
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
X. Li
Guangliang Cheng
Mamba
87
0
0
01 May 2025
Scaling On-Device GPU Inference for Large Generative Models
Jiuqiang Tang
Raman Sarokin
Ekaterina Ignasheva
Grant Jensen
Lin Chen
Juhyun Lee
Andrei Kulik
Matthias Grundmann
121
1
0
01 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
96
1
0
01 May 2025
GPU Performance Portability needs Autotuning
Burkhard Ringlein
Thomas Parnell
Radu Stoica
120
0
0
30 Apr 2025
Ascendra: Dynamic Request Prioritization for Efficient LLM Serving
Azam Ikram
Xiang Li
Sameh Elnikety
S. Bagchi
102
0
0
29 Apr 2025
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Zayd Muhammad Kawakibi Zuhri
Erland Hilman Fuadi
Alham Fikri Aji
31
0
0
29 Apr 2025
Blockbuster, Part 1: Block-level AI Operator Fusion
Ofer Dekel
19
0
0
29 Apr 2025
Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection
Ziqing Fan
Siyuan Du
Shengchao Hu
Pingjie Wang
Li Shen
Y. Zhang
Dacheng Tao
Y. Wang
41
1
0
29 Apr 2025
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu (Allen) Zhang
Zechun Liu
Yuandong Tian
Harshit Khaitan
ziqi wang
Steven Li
57
0
0
28 Apr 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
52
1
0
28 Apr 2025
Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling
Ishan Kavathekar
Raghav Donakanti
Ponnurangam Kumaraguru
Karthik Vaidhyanathan
56
0
0
27 Apr 2025
A Method for the Architecture of a Medical Vertical Large Language Model Based on Deepseek R1
Mingda Zhang
Jianglong Qin
MedIm
42
0
0
25 Apr 2025
TileLang: A Composable Tiled Programming Model for AI Systems
Lei Wang
Yu Cheng
Yining Shi
Zhengju Tang
Zhiwen Mo
...
Lingxiao Ma
Yuqing Xia
Jilong Xue
Fan Yang
Z. Yang
63
1
0
24 Apr 2025
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
Qingyuan Liu
Liyan Chen
Yanning Yang
H. Wang
Dong Du
Zhigang Mao
Naifeng Jing
Yubin Xia
Haibo Chen
29
0
0
24 Apr 2025
Embedding Empirical Distributions for Computing Optimal Transport Maps
Mingchen Jiang
Peng Xu
Xichen Ye
Xiaohui Chen
Yun Yang
Yifan Chen
OT
56
0
0
24 Apr 2025
Tempo: Application-aware LLM Serving with Mixed SLO Requirements
Wei Zhang
Zhiyu Wu
Yi Mu
Banruo Liu
Myungjin Lee
Fan Lai
55
0
0
24 Apr 2025
An Empirical Study on Prompt Compression for Large Language Models
Z. Zhang
Jinyi Li
Yihuai Lan
X. Wang
Hao Wang
MQ
46
0
0
24 Apr 2025
Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light
Ali Hassani
Fengzhe Zhou
Aditya Kane
Jiannan Huang
Chieh-Yun Chen
...
Bing Xu
Haicheng Wu
Wen-mei W. Hwu
Ming-Yu Liu
Humphrey Shi
26
0
0
23 Apr 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Z. Liu
Dong Li
E. Barsoum
58
0
0
23 Apr 2025
llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length
Issa Sugiura
Kouta Nakayama
Yusuke Oda
29
0
0
22 Apr 2025
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference
Ye Qiao
Zhiheng Cheng
Yian Wang
Yifan Zhang
Yunzhe Deng
Sitao Huang
77
0
0
22 Apr 2025
Bidirectional Mamba for Single-Cell Data: Efficient Context Learning with Biological Fidelity
Cong Qi
Hanzhang Fang
Tianxing Hu
Siqi Jiang
Wei Zhi
Mamba
58
0
0
22 Apr 2025
Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis
X. Zhang
Yaoyao Ding
Yang Hu
Gennady Pekhimenko
41
0
0
22 Apr 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
88
0
0
22 Apr 2025
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park
Dalton Jones
Matt Morse
Raghavv Goel
Mingu Lee
Chris Lott
22
0
0
21 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
133
0
0
21 Apr 2025
SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training
Z. Li
Y. Liu
W. Zhang
Tailing Yuan
Bin Chen
Chengru Song
Di Zhang
32
0
0
20 Apr 2025
Learning to Attribute with Attention
Benjamin Cohen-Wang
Yung-Sung Chuang
Aleksander Madry
27
0
0
18 Apr 2025
Antidistillation Sampling
Yash Savani
Asher Trockman
Zhili Feng
Avi Schwarzschild
Alexander Robey
Marc Finzi
J. Zico Kolter
46
0
0
17 Apr 2025
1
2
3
4
...
27
28
29
Next