Title |
---|
![]() MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA Hanrong Ye Haotian Zhang Erik Daxberger Lin Chen Zongyu Lin ...Haoxuan You Dan Xu Zhe Gan Jiasen Lu Yinfei Yang |
![]() MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Yi-Fan Zhang Huanyu Zhang Haochen Tian Chaoyou Fu Shuangqing Zhang ...Qingsong Wen Zhang Zhang Liwen Wang Rong Jin Tieniu Tan |
![]() EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model Feipeng Ma Yizhou Zhou Hebei Li Zilong He Siying Wu Fengyun Rao Siying Wu Fengyun Rao Yueyi Zhang Xiaoyan Sun |
![]() ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Cheng Yang Chufan Shi Yaxin Liu Bo Shui Junjie Wang ...Yuxiang Zhang Gongye Liu Xiaomei Nie Deng Cai Yujiu Yang |