Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.16267
Cited By
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
21 October 2024
Michael S Ryoo
Honglu Zhou
Shrikant B. Kendre
Can Qin
Le Xue
Manli Shu
Silvio Savarese
Ran Xu
Caiming Xiong
Juan Carlos Niebles
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs"
2 / 2 papers shown
Title
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
57
0
0
02 May 2025
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Yi-Xing Peng
Q. Yang
Yu-Ming Tang
Shenghao Fu
Kun-Yu Lin
Xihan Wei
Wei-Shi Zheng
45
0
0
25 Apr 2025
1