Title |
---|
![]() Emu3: Next-Token Prediction is All You Need Xinlong Wang Xiaosong Zhang Zhengxiong Luo Quan-Sen Sun Yufeng Cui ...Xi Yang Jingjing Liu Yonghua Lin Tiejun Huang Zhongyuan Wang |
![]() MIO: A Foundation Model on Multimodal Tokens Zekun Wang King Zhu Chunpu Xu Wangchunshu Zhou Jiaheng Liu ...Yuanxing Zhang Ge Zhang Ke Xu Jie Fu Wenhao Huang |
![]() Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models Matt Deitke Christopher Clark Sangho Lee Rohun Tripathi Yue Yang ...Noah A. Smith Hannaneh Hajishirzi Ross Girshick Ali Farhadi Aniruddha Kembhavi |
![]() JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images Zhecan Wang Junzhang Liu Chia-Wei Tang Hani Alomari Anushka Sivakumar ...Haoxuan You A. Ishmam Kai-Wei Chang Shih-Fu Chang Chris Thomas |
![]() UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Baichuan Zhou Haote Yang Dairong Chen Junyan Ye Tianyi Bai Jinhua Yu Songyang Zhang Dahua Lin Conghui He Weijia Li |