Title |
---|
![]() CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for
Foundation Models Zhong-Zhi Li Ming-Liang Zhang Fei Yin Zhi-Long Ji Jin-Feng Bai Zhen-Ru Pan Fan-Hu Zeng Jian Xu Jia-Xin Zhang Cheng-Lin Liu |
![]() What If We Recaption Billions of Web Images with LLaMA-3? Xianhang Li Haoqin Tu Mude Hui Zeyu Wang Bingchen Zhao ...Jieru Mei Qing Liu Huangjie Zheng Yuyin Zhou Cihang Xie |
![]() DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception Run Luo Yunshui Li Longze Chen Wanwei He Ting-En Lin ...Zikai Song Xiaobo Xia Tongliang Liu Min Yang Binyuan Hui |
![]() Mementos: A Comprehensive Benchmark for Multimodal Large Language Model
Reasoning over Image Sequences Xiyao Wang Yuhang Zhou Xiaoyu Liu Hongjin Lu Yuancheng Xu ...Taixi Lu Gedas Bertasius Mohit Bansal Huaxiu Yao Furong Huang |