Title |
---|
![]() MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA Hanrong Ye Haotian Zhang Erik Daxberger Lin Chen Zongyu Lin ...Haoxuan You Dan Xu Zhe Gan Jiasen Lu Yinfei Yang |
![]() JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images Zhecan Wang Junzhang Liu Chia-Wei Tang Hani Alomari Anushka Sivakumar ...Haoxuan You A. Ishmam Kai-Wei Chang Shih-Fu Chang Chris Thomas |
![]() Large Language Models are Strong Audio-Visual Speech Recognition Learners Umberto Cappellazzo Minsu Kim Honglie Chen Pingchuan Ma Stavros Petridis Daniele Falavigna Alessio Brutti Maja Pantic |
![]() UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Baichuan Zhou Haote Yang Dairong Chen Junyan Ye Tianyi Bai Jinhua Yu Songyang Zhang Dahua Lin Conghui He Weijia Li |
![]() VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models Haodong Duan Junming Yang Junming Yang Xinyu Fang Lin Chen ...Yuhang Zang Pan Zhang Jiaqi Wang Dahua Lin Kai Chen |