Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
v1
v2 (latest)
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,508 papers shown
Title
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka
Taichi Iki
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
116
6
0
14 Apr 2025
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
Abhinav Joshi
A. Ahmad
Divyaksh Shukla
Ashutosh Modi
ReLM
LRM
84
0
0
14 Apr 2025
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference
Yangshen Deng
Zhengxin You
Long Xiang
Qilong Li
Peiqi Yuan
...
Man Lung Yiu
Huan Li
Qiaomu Shen
Rui Mao
Bo Tang
82
0
0
14 Apr 2025
Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
Wan Borui
Zhao Juntao
Jiang Chenyu
Guo Chuanxiong
Wu Chuan
VLM
153
1
0
13 Apr 2025
FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment
Sijing Wu
Yunhao Li
Ziwen Xu
Yixuan Gao
Huiyu Duan
Wei Sun
Guangtao Zhai
360
1
0
12 Apr 2025
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
Yichao Yuan
Lin Ma
Nishil Talati
MoE
107
0
0
12 Apr 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Wissam Antoun
B. Sagot
Djamé Seddah
MQ
70
1
0
11 Apr 2025
Particle Hit Clustering and Identification Using Point Set Transformers in Liquid Argon Time Projection Chambers
Edgar E. Robles
A. Yankelevich
Wenjie Wu
J. Bian
Pierre Baldi
63
0
0
11 Apr 2025
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
M. Dhouib
Davide Buscaldi
Sonia Vanier
A. Shabou
VLM
107
1
0
11 Apr 2025
Token Level Routing Inference System for Edge Devices
Jianshu She
Wenhao Zheng
Zhengzhong Liu
Hongyi Wang
Eric P. Xing
Huaxiu Yao
Qirong Ho
76
1
0
10 Apr 2025
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
Shihong Gao
Wei Wei
Yanyan Shen
Lei Chen
70
1
0
10 Apr 2025
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Z. Huang
Zhe Chen
Zijia Zhao
Ziwei Chen
Zongyu Lin
MLLM
VLM
MoE
393
32
0
10 Apr 2025
Distilling Textual Priors from LLM to Efficient Image Fusion
Ran Zhang
Xuanhua He
Ke Cao
Liu Liu
Li Zhang
Man Zhou
Jie Zhang
90
0
0
09 Apr 2025
CHIME: A Compressive Framework for Holistic Interest Modeling
Yong Bai
Rui Xiang
Kaiyuan Li
Yongxiang Tang
Yanhua Cheng
Xialong Liu
Peng Jiang
Kun Gai
64
1
0
09 Apr 2025
Crafting Query-Aware Selective Attention for Single Image Super-Resolution
Junyoung Kim
Youngrok Kim
Siyeol Jung
Donghyun Min
89
0
0
09 Apr 2025
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
Yoshihiro Yamada
ViT
79
0
0
09 Apr 2025
High-Resource Translation:Turning Abundance into Accessibility
Abhiram Reddy Yanampally
46
0
0
08 Apr 2025
SPIRe: Boosting LLM Inference Throughput with Speculative Decoding
Sanjit Neelam
Daniel Heinlein
Vaclav Cvicek
Akshay Mishra
Reiner Pope
LRM
77
0
0
08 Apr 2025
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
Igor Polyakov
Alexey Dukhanov
Egor Spirin
81
0
0
08 Apr 2025
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Xinpeng Ding
Kai Zhang
Jinahua Han
Lanqing Hong
Hang Xu
Xuelong Li
MLLM
VLM
498
0
0
08 Apr 2025
Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching
Yanhao Dong
Yubo Miao
Weinan Li
Xiao Zheng
Chao Wang
Feng Lyu
59
0
0
08 Apr 2025
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
Yanbiao Liang
Huihong Shi
Haikuo Shao
Zhongfeng Wang
94
0
0
07 Apr 2025
One-Minute Video Generation with Test-Time Training
Karan Dalal
Daniel Koceja
Gashon Hussein
Jiarui Xu
Yue Zhao
...
Tatsunori Hashimoto
Sanmi Koyejo
Yejin Choi
Yu Sun
Xiaolong Wang
ViT
181
13
0
07 Apr 2025
Window Token Concatenation for Efficient Visual Large Language Models
Yifan Li
Wentao Bao
Botao Ye
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
VLM
97
0
0
05 Apr 2025
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
79
0
0
05 Apr 2025
Reasoning on Multiple Needles In A Haystack
Yidong Wang
LRM
52
0
0
05 Apr 2025
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Simon A. Lee
Anthony Wu
Jeffrey N. Chiang
MedIm
95
6
0
04 Apr 2025
Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable
Xin Jin
Simon Niklaus
Zhoutong Zhang
Zhihao Xia
Chunle Guo
Yuting Yang
J. Chen
Chongyi Li
VGen
135
1
0
04 Apr 2025
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
Yongji Wu
Xueshen Liu
Shuowei Jin
Ceyu Xu
Feng Qian
Ziming Mao
Matthew Lentz
Danyang Zhuo
Ion Stoica
MoMe
MoE
100
0
0
04 Apr 2025
FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention
Huangliang Dai
Shixun Wu
Hairui Zhao
Jiajun Huang
Zizhe Jian
Yue Zhu
Haiyang Hu
Zizhong Chen
66
0
0
03 Apr 2025
A Framework for Robust Cognitive Evaluation of LLMs
Karin de Langis
J. Park
Bin Hu
Khanh Chi Le
Andreas Schramm
Michael C. Mensink
Andrew Elfenbein
Dongyeop Kang
86
0
0
03 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
279
0
0
03 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
173
0
0
02 Apr 2025
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding
Sakhinana Sagar Srinivas
Akash Das
Shivam Gupta
Venkataramana Runkana
OffRL
120
1
0
02 Apr 2025
Urban Computing in the Era of Large Language Models
Zhonghang Li
Lianghao Xia
Xubin Ren
J. Tang
Tianyi Chen
Yong-mei Xu
Chenyu Huang
218
0
0
02 Apr 2025
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
Runlong Zhou
Yi Zhang
RALM
88
0
0
02 Apr 2025
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
Nicola Muca Cirone
C. Salvi
117
2
0
01 Apr 2025
TransMamba: Flexibly Switching between Transformer and Mamba
Yixing Li
Ruobing Xie
Zhen Yang
Xingwu Sun
Shuaipeng Li
...
Zhanhui Kang
Yu Cheng
C. Xu
Di Wang
Jie Jiang
Mamba
152
2
0
31 Mar 2025
Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference
Wei Tao
Bin Zhang
Xiaoyang Qu
Jiguang Wan
Jianzong Wang
128
2
0
30 Mar 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
Jiadong Wang
Tao Dai
Shu-Tao Xia
Luca Benini
157
5
0
30 Mar 2025
Large Language and Reasoning Models are Shallow Disjunctive Reasoners
Irtaza Khalid
Amir Masoud Nourollah
Steven Schockaert
LRM
180
1
0
30 Mar 2025
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Xinyu Wang
Linrui Ma
Jerry Huang
Peng Lu
Prasanna Parthasarathi
Xiao-Wen Chang
Boxing Chen
Yufei Cui
KELM
130
1
0
28 Mar 2025
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian Sun
Wei Ma
212
4
0
27 Mar 2025
A Multi-Modal Knowledge-Enhanced Framework for Vessel Trajectory Prediction
Haomin Yu
Tianyi Li
Kristian Torp
Christian S. Jensen
55
1
0
27 Mar 2025
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
Dongchen Lu
Yuyao Sun
Zilu Zhang
Leping Huang
Jianliang Zeng
Mao Shu
Huo Cao
140
4
0
27 Mar 2025
Inductive Link Prediction on N-ary Relational Facts via Semantic Hypergraph Reasoning
Gongzhu Yin
Hao Zhang
Yuchen Yang
Yihao Luo
LRM
112
1
0
26 Mar 2025
Named Entity Recognition in Context
Colin Brisson
Ayoub Kahfy
Marc Bui
Frédéric Constant
138
0
0
26 Mar 2025
UniEDU: A Unified Language and Vision Assistant for Education Applications
Zhendong Chu
Jian Xie
Shen Wang
Ziyi Wang
Qingsong Wen
AI4Ed
150
0
0
26 Mar 2025
Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder
Changye Li
Weizhe Xu
Serguei V. S. Pakhomov
Ellen Bradley
Dror Ben-Zeev
T. Cohen
109
0
0
25 Mar 2025
GIViC: Generative Implicit Video Compression
Ge Gao
Siyue Teng
Tianhao Peng
Fan Zhang
David Bull
DiffM
VGen
79
0
0
25 Mar 2025
Previous
1
2
3
4
5
6
...
29
30
31
Next