Title |
---|
![]() MIO: A Foundation Model on Multimodal Tokens Zekun Wang King Zhu Chunpu Xu Wangchunshu Zhou Jiaheng Liu ...Yuanxing Zhang Ge Zhang Ke Xu Jie Fu Wenhao Huang |
![]() MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Yi-Fan Zhang Huanyu Zhang Haochen Tian Chaoyou Fu Shuangqing Zhang ...Qingsong Wen Zhang Zhang Liwen Wang Rong Jin Tieniu Tan |
![]() EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model Feipeng Ma Yizhou Zhou Hebei Li Zilong He Siying Wu Fengyun Rao Siying Wu Fengyun Rao Yueyi Zhang Xiaoyan Sun |
![]() Visual Agents as Fast and Slow Thinkers Guangyan Sun Mingyu Jin Zhenting Wang Cheng-Long Wang Siqi Ma Qifan Wang Ying Nian Wu Ying Nian Wu Dongfang Liu Dongfang Liu |
![]() VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models Haodong Duan Junming Yang Junming Yang Xinyu Fang Lin Chen ...Yuhang Zang Pan Zhang Jiaqi Wang Dahua Lin Kai Chen |