Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
v1
v2 (latest)
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,508 papers shown
Title
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Zichen Wen
Yifeng Gao
Weijia Li
Conghui He
Linfeng Zhang
LRM
172
6
0
17 Feb 2025
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
Yuxiang Huang
Mingye Li
Xu Han
Chaojun Xiao
Weilin Zhao
Sun Ao
Hao Zhou
Jie Zhou
Zhiyuan Liu
Maosong Sun
124
0
0
17 Feb 2025
Understanding Silent Data Corruption in LLM Training
Jeffrey Ma
Hengzhi Pei
Leonard Lausen
George Karypis
70
0
0
17 Feb 2025
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni
Yuxin Guo
Yichen Liu
Rui Chen
Lewei Lu
Z. Wu
DiffM
VGen
144
5
0
17 Feb 2025
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
Jiashuo Wang
Hansong Zhou
Ting Song
Shijie Cao
Yan Xia
Ting Cao
Jianyu Wei
Shuming Ma
Hongyu Wang
Furu Wei
127
1
0
17 Feb 2025
Low-Rank Thinning
Annabelle Michael Carrell
Albert Gong
Abhishek Shetty
Raaz Dwivedi
Lester W. Mackey
108
0
0
17 Feb 2025
DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services
Ting Sun
Penghan Wang
Fan Lai
95
0
0
17 Feb 2025
RaaS: Reasoning-Aware Attention Sparsity for Efficient LLM Reasoning
Junhao Hu
Wenrui Huang
Weidong Wang
Zhenwen Li
Tiancheng Hu
Zhixia Liu
Xusheng Chen
Tao Xie
Yizhou Shan
LRM
138
1
0
16 Feb 2025
KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang
Simon Guo
Simran Arora
Alex L. Zhang
William Hu
Christopher Ré
Azalia Mirhoseini
ALM
135
4
0
14 Feb 2025
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization
Bowen Pang
Kai Li
Ruifeng She
Feifan Wang
OffRL
113
2
0
14 Feb 2025
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Heejun Lee
G. Park
Jaduk Suh
Sung Ju Hwang
153
4
0
13 Feb 2025
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Konstantin Berestizshevsky
Renzo Andri
Lukas Cavigelli
110
2
0
12 Feb 2025
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Qingshui Gu
Shu Li
Tianyu Zheng
Zhaoxiang Zhang
517
0
0
10 Feb 2025
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
Yang Zhou
Hongyi Liu
Zhuoming Chen
Yuandong Tian
Beidi Chen
LRM
112
14
0
07 Feb 2025
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
Feng Wang
Yaodong Yu
Guoyizhe Wei
Wei Shao
Yuyin Zhou
Alan Yuille
Cihang Xie
ViT
147
7
0
06 Feb 2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
MQ
112
5
0
05 Feb 2025
Unified Spatial-Temporal Edge-Enhanced Graph Networks for Pedestrian Trajectory Prediction
Ruochen Li
Tanqiu Qiao
Stamos Katsigiannis
Zhanxing Zhu
Hubert P. H. Shum
AI4TS
121
4
0
04 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
473
7
0
04 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
553
0
0
04 Feb 2025
Adaptive Self-improvement LLM Agentic System for ML Library Development
Genghan Zhang
Weixin Liang
Olivia Hsu
K. Olukotun
449
2
0
04 Feb 2025
Rethinking Homogeneity of Vision and Text Tokens in Large Vision-and-Language Models
Chia-Wen Kuo
Sijie Zhu
Fan Chen
Xiaohui Shen
Longyin Wen
VLM
93
1
0
04 Feb 2025
Latent Thought Models with Variational Bayes Inference-Time Computation
Deqian Kong
Minglu Zhao
Dehong Xu
Bo Pang
Shu Wang
...
Zhangzhang Si
Chuan Li
Jianwen Xie
Sirui Xie
Ying Nian Wu
VLM
LRM
BDL
143
10
0
03 Feb 2025
Transformers trained on proteins can learn to attend to Euclidean distance
Isaac Ellmen
Constantin Schneider
Matthew I.J. Raybould
Charlotte M. Deane
103
0
0
03 Feb 2025
Vision-centric Token Compression in Large Language Model
Ling Xing
Alex Jinpeng Wang
Rui Yan
Xiangbo Shu
Jinhui Tang
VLM
155
0
0
02 Feb 2025
PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration
Songhao Wu
Ang Lv
Xiao Feng
Yanzhe Zhang
Xun Zhang
Guojun Yin
Wei Lin
Rui Yan
MQ
91
1
0
01 Feb 2025
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
Nathaniel Tomczak
Sanmukh Kuppannagari
212
0
0
31 Jan 2025
Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks
Shengyao Zhuang
Ekaterina Khramtsova
Xueguang Ma
Bevan Koopman
Jimmy Lin
Guido Zuccon
AAML
100
1
0
28 Jan 2025
GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments
Yanyu Chen
Ganhong Huang
154
0
0
28 Jan 2025
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Zhihang Lin
Mingbao Lin
Luxi Lin
Rongrong Ji
108
24
0
28 Jan 2025
Qwen2.5-1M Technical Report
An Yang
Bowen Yu
Chong Li
Dayiheng Liu
Fei Huang
...
Xingzhang Ren
Xinlong Yang
You Li
Zhiying Xu
Zizhuo Zhang
139
29
0
28 Jan 2025
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Yuan Feng
Junlin Lv
Yukun Cao
Xike Xie
S. K. Zhou
VLM
150
44
0
28 Jan 2025
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari
Amir Yazdanbakhsh
Zhao Zhang
M. Dehnavi
146
7
0
28 Jan 2025
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
Zhan Ling
Kang Liu
Kai Yan
Yue Yang
Weijian Lin
Ting-Han Fan
Lingfeng Shen
Zhengyin Du
Jiecao Chen
ReLM
ELM
LRM
107
8
0
25 Jan 2025
Wormhole Memory: A Rubik's Cube for Cross-Dialogue Retrieval
Libo Wang
491
0
0
24 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
193
3
0
24 Jan 2025
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Jianing Yang
Alexander Sax
Kevin J. Liang
Mikael Henaff
Hao Tang
Ang Cao
J. Chai
Franziska Meier
Matt Feiszli
3DGS
191
31
0
23 Jan 2025
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science
Kaiyuan Tian
Linbo Qiao
Baihui Liu
Gongqingjian Jiang
Dongsheng Li
98
0
0
21 Jan 2025
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun
Penghan Wang
Fan Lai
549
2
0
15 Jan 2025
Estimating Musical Surprisal in Audio
Mathias Rose Bjare
Giorgia Cantisani
Stefan Lattner
Gerhard Widmer
78
0
0
13 Jan 2025
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference
Wenxuan Zeng
Ye Dong
Jinjin Zhou
Junming Ma
Jin Tan
Runsheng Wang
Meng Li
100
0
0
12 Jan 2025
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation
Shunfan Zheng
Xiechi Zhang
Gerard de Melo
Xiaoling Wang
Linlin Wang
LM&MA
ELM
49
1
0
12 Jan 2025
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Yuwei Fang
Kwot Sin Lee
Ivan Skorokhodov
Kfir Aberman
Jun-Yan Zhu
Ming-Hsuan Yang
Sergey Tulyakov
DiffM
VGen
192
13
0
10 Jan 2025
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
94
0
0
10 Jan 2025
Clinical Insights: A Comprehensive Review of Language Models in Medicine
Nikita Neveditsin
Pawan Lingras
V. Mago
LM&MA
113
5
0
08 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
119
2
0
07 Jan 2025
TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms
Jovan Stojkovic
Chaojie Zhang
Íñigo Goiri
Esha Choukse
Haoran Qiu
Rodrigo Fonseca
Josep Torrellas
Ricardo Bianchini
80
6
0
05 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
431
6
0
05 Jan 2025
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Jiajun Zhu
Peihao Wang
Ruisi Cai
Jason D. Lee
Pan Li
Ziyi Wang
KELM
112
1
0
03 Jan 2025
FAST: Fast Audio Spectrogram Transformer
Anugunj Naman
Gaibo Zhang
63
0
0
03 Jan 2025
Text2midi: Generating Symbolic Music from Captions
Keshav Bhandari
Abhinaba Roy
Kyra Wang
Geeta Puri
Simon Colton
Dorien Herremans
158
6
0
03 Jan 2025
Previous
1
2
3
...
7
8
9
...
29
30
31
Next