Title |
---|
![]() MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Yi-Fan Zhang Huanyu Zhang Haochen Tian Chaoyou Fu Shuangqing Zhang ...Qingsong Wen Zhang Zhang Liwen Wang Rong Jin Tieniu Tan |
![]() PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal
Documents Junjie Wang Yin Zhang Yatai Ji Yuxiang Zhang Chunyang Jiang ...Bei Chen Qunshu Lin Minghao Liu Ge Zhang Wenhu Chen |
![]() MuirBench: A Comprehensive Benchmark for Robust Multi-image
Understanding Fei Wang Xingyu Fu James Y. Huang Zekun Li Qin Liu ...Kai-Wei Chang Dan Roth Sheng Zhang Hoifung Poon Muhao Chen |
![]() Needle In A Multimodal Haystack Weiyun Wang Shuibo Zhang Yiming Ren Yuchen Duan Tiantong Li ...Ping Luo Yu Qiao Jifeng Dai Wenqi Shao Wenhai Wang |
![]() DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception Run Luo Yunshui Li Longze Chen Wanwei He Ting-En Lin ...Zikai Song Xiaobo Xia Tongliang Liu Min Yang Binyuan Hui |