Title |
---|
![]() VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths
Vision Computation |
![]() A Survey on Evaluation of Multimodal Large Language Models Jiaxing Huang Jingyi Zhang |
![]() MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Yi-Fan Zhang Huanyu Zhang Haochen Tian Chaoyou Fu Shuangqing Zhang ...Qingsong Wen Zhang Zhang Liwen Wang Rong Jin Tieniu Tan |
![]() NAVERO: Unlocking Fine-Grained Semantics for Video-Language
Compositionality Chaofan Tao Gukyeong Kwon Varad Gunjal Hao Yang Zhaowei Cai Yonatan Dukler Ashwin Swaminathan R. Manmatha Colin Jon Taylor Stefano Soatto |