Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.21403
Cited By
Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question Answering
30 April 2025
Yumeng Shi
Quanyu Long
Wenya Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question Answering"
6 / 6 papers shown
Title
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models
Tianyu Fu
Tengxuan Liu
Qinghao Han
Guohao Dai
Shengen Yan
H. Yang
Xuefei Ning
Yu Wang
42
7
0
30 Dec 2024
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression
Yefei He
Feng Chen
Jing Liu
Wenqi Shao
Hong Zhou
Kai Zhang
Bohan Zhuang
VLM
80
13
0
11 Oct 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
VLM
MLLM
126
379
0
31 May 2024
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
83
289
0
17 Aug 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
273
948
0
27 Apr 2023
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
95
454
0
17 Oct 2022
1