Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.08691
Cited By
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
17 July 2023
Tri Dao
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning"
50 / 329 papers shown
Title
Accelerating Prefilling for Long-Context LLMs via Sparse Pattern Sharing
Dan Peng
Zhihui Fu
Zewen Ye
Zhuoran Song
Jun Wang
33
0
0
26 May 2025
Understanding Transformer from the Perspective of Associative Memory
Shu Zhong
Mingyu Xu
Tenglong Ao
Guang Shi
47
1
0
26 May 2025
Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing
Zhaoyuan Su
Tingfeng Lan
Zirui Wang
Juncheng Yang
Yue Cheng
22
0
0
24 May 2025
Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
Meng Li
Guangda Huzhang
Haibo Zhang
Xiting Wang
Anxiang Zeng
42
0
0
24 May 2025
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention
Can Yaras
Alec S. Xu
Pierre Abillama
Changwoo Lee
Laura Balzano
36
0
0
24 May 2025
Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query
Yixuan Wang
Shiyu Ji
Yijun Liu
Yuzhuang Xu
Yang Xu
Qingfu Zhu
Wanxiang Che
68
0
0
24 May 2025
Training with Pseudo-Code for Instruction Following
Praveen Venkateswaran
Rudra Murthy
Riyaz Ahmad Bhat
Danish Contractor
ALM
LRM
100
0
0
23 May 2025
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
Shuang Wu
Youtian Lin
Feihu Zhang
Yifei Zeng
Yikang Yang
...
Jiachen Qian
Siyu Zhu
Xun Cao
Philip Torr
Yao Yao
3DGS
126
1
0
23 May 2025
Multimodal Conversation Structure Understanding
Kent K. Chang
Mackenzie Cramer
Anna Ho
Ti Ti Nguyen
Yilin Yuan
David Bamman
64
0
0
23 May 2025
FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
Zhibin Wang
Rui Ning
Chao Fang
Zhonghui Zhang
Xi Lin
...
Rong Gu
Kun Yang
Guihai Chen
Sheng Zhong
Chen Tian
58
0
0
23 May 2025
MARché: Fast Masked Autoregressive Image Generation with Cache-Aware Attention
Chaoyi Jiang
Sungwoo Kim
Lei Gao
Hossein Entezari Zarch
Won Woo Ro
Murali Annavaram
24
0
0
22 May 2025
LongMagpie: A Self-synthesis Method for Generating Large-scale Long-context Instructions
Chaochen Gao
Xing Wu
Zijia Lin
Debing Zhang
Songlin Hu
SyDa
214
0
0
22 May 2025
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
Benjamin Schneider
Dongfu Jiang
Chao Du
Tianyu Pang
Wenhu Chen
VLM
77
0
0
22 May 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
104
0
0
22 May 2025
Stronger ViTs With Octic Equivariance
David Nordström
Johan Edstedt
Fredrik Kahl
Georg Bökman
ViT
227
0
0
21 May 2025
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
Mehrdad Ghassabi
Pedram Rostami
Hamidreza Baradaran Kashani
Amirhossein Poursina
Zahra Kazemi
Milad Tavakoli
LM&MA
191
0
0
21 May 2025
SUS backprop: linear backpropagation algorithm for long inputs in transformers
Sergey Pankov
Georges Harik
112
0
0
21 May 2025
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
Penghao Wu
Lewei Lu
Ziwei Liu
129
0
0
21 May 2025
Toward Embodied AGI: A Review of Embodied AI and the Road Ahead
Yequan Wang
Aixin Sun
LM&Ro
AI4CE
129
1
0
20 May 2025
s3: You Don't Need That Much Data to Train a Search Agent via RL
Pengcheng Jiang
Xueqiang Xu
Jiacheng Lin
Jinfeng Xiao
Zifeng Wang
Jimeng Sun
Jiawei Han
OffRL
RALM
AI4TS
LRM
113
1
0
20 May 2025
Low-Cost FlashAttention with Fused Exponential and Multiplication Hardware Operators
K. Alexandridis
Vasileios Titopoulos
G. Dimitrakopoulos
70
0
0
20 May 2025
FlashThink: An Early Exit Method For Efficient Reasoning
Guochao Jiang
Guofeng Quan
Zepeng Ding
Ziqin Luo
Dixuan Wang
Zheng Hu
ReLM
LRM
72
2
0
20 May 2025
LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference
Guangyuan Ma
Yongliang Ma
Xuanrui Gou
Zhenpeng Su
Ming Zhou
Songlin Hu
RALM
86
0
0
18 May 2025
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
Wenqiao Zhu
Chao Xu
Lulu Wang
Jun Wu
107
1
0
18 May 2025
AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation
Xiechi Zhang
Zetian Ouyang
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Ya Zhang
Yanfeng Wang
Liang He
LM&MA
ELM
124
0
0
17 May 2025
Chain-of-Model Learning for Language Model
Kaitao Song
Xiaohua Wang
Xu Tan
Huiqiang Jiang
Chengruidong Zhang
...
Xiaoqing Zheng
Tao Qin
Yuqing Yang
Dongsheng Li
Lili Qiu
LRM
AI4CE
193
1
0
17 May 2025
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Huashan Sun
Shengyi Liao
Yansen Han
Yu Bai
Yang Gao
...
Weizhou Shen
Fanqi Wan
Ming Yan
J.N. Zhang
Fei Huang
177
0
0
16 May 2025
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yun Wang
Yuxuan Liu
Y. X. Wei
MoE
74
1
0
14 May 2025
FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs
Pencuo Zeren
Qiuming Luo
Rui Mao
Chang Kong
31
0
0
13 May 2025
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding
Dawei Huang
Qing Li
Chuan Yan
Zebang Cheng
Jiaming Ji
Xiang Li
Yangqiu Song
Xiaobei Wang
Zheng Lian
Xiaojiang Peng
71
1
0
10 May 2025
I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference
Zibo Gao
Junjie Hu
Feng Guo
Yixin Zhang
Yinglong Han
Siyuan Liu
Haiyang Li
Zhiqiang Lv
105
0
0
10 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
Dengyang Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
111
0
0
05 May 2025
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
Ziyi Wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
317
0
0
02 May 2025
PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding
Bradley McDanel
Shanghang Zhang
Y. Hu
Zining Liu
MoE
441
0
0
02 May 2025
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
Xuelong Li
Guangliang Cheng
Mamba
267
0
0
01 May 2025
FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
Jushi Kai
Boyi Zeng
Yansen Wang
Haoli Bai
Ziwei He
Bo Jiang
Zhouhan Lin
133
0
0
01 May 2025
GPU Performance Portability needs Autotuning
Burkhard Ringlein
Thomas Parnell
Radu Stoica
451
0
0
30 Apr 2025
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Zayd Muhammad Kawakibi Zuhri
Erland Hilman Fuadi
Alham Fikri Aji
54
0
0
29 Apr 2025
Embedding Empirical Distributions for Computing Optimal Transport Maps
Mingchen Jiang
Peng Xu
Xichen Ye
Xiaohui Chen
Yun Yang
Yifan Chen
OT
126
0
0
24 Apr 2025
Distilling semantically aware orders for autoregressive image generation
Rishav Pramanik
Antoine Poupon
Juan A. Rodriguez
Masih Aminbeidokhti
David Vazquez
Christopher Pal
Zhaozheng Yin
M. Pedersoli
80
0
0
23 Apr 2025
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
Yucheng Li
Huiqiang Jiang
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Jianfeng Gao
Yue Yang
Lili Qiu
111
3
0
22 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
477
0
0
21 Apr 2025
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
Rahul Thapa
Andrew Li
Qingyang Wu
Bryan He
Yuki Sahashi
...
Angela Zhang
Ben Athiwaratkun
Shuaiwen Leon Song
David Ouyang
James Zou
LM&MA
174
0
0
19 Apr 2025
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
Xinlin Zhuang
Jiahui Peng
Ren Ma
Yucheng Wang
Tianyi Bai
Xingjian Wei
Jiantao Qiu
Chi Zhang
Ying Qian
Conghui He
151
0
0
19 Apr 2025
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao
Devaansh Gupta
Qinqing Zheng
Aditya Grover
DiffM
LRM
AI4CE
157
9
0
16 Apr 2025
CSPLADE: Learned Sparse Retrieval with Causal Language Models
Zhichao Xu
Aosong Feng
Yijun Tian
Haibo Ding
Lin Leee Cheong
RALM
105
0
0
15 Apr 2025
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference
Yangshen Deng
Zhengxin You
Long Xiang
Qilong Li
Peiqi Yuan
...
Man Lung Yiu
Huan Li
Qiaomu Shen
Rui Mao
Bo Tang
85
0
0
14 Apr 2025
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAG
LRM
437
9
0
14 Apr 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Wissam Antoun
B. Sagot
Djamé Seddah
MQ
70
1
0
11 Apr 2025
Distilling Textual Priors from LLM to Efficient Image Fusion
Ran Zhang
Xuanhua He
Ke Cao
Liu Liu
Li Zhang
Man Zhou
Jie Zhang
92
0
0
09 Apr 2025
Previous
1
2
3
4
5
6
7
Next