Title |
---|
![]() MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang Mingfei Gao Zhe Gan Philipp Dufter Nina Wenzel ...Haoxuan You Zirui Wang Afshin Dehghan Peter Grasch Yinfei Yang |
![]() Emu3: Next-Token Prediction is All You Need Xinlong Wang Xiaosong Zhang Zhengxiong Luo Quan-Sen Sun Yufeng Cui ...Xi Yang Jingjing Liu Yonghua Lin Tiejun Huang Zhongyuan Wang |
![]() Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models Matt Deitke Christopher Clark Sangho Lee Rohun Tripathi Yue Yang ...Noah A. Smith Hannaneh Hajishirzi Ross Girshick Ali Farhadi Aniruddha Kembhavi |
![]() VITA: Towards Open-Source Interactive Omni Multimodal LLM Chaoyou Fu Haojia Lin Zuwei Long Yunhang Shen Meng Zhao ...Ran He Rongrong Ji Yunsheng Wu Caifeng Shan Xing Sun |
![]() VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models Haodong Duan Junming Yang Junming Yang Xinyu Fang Lin Chen ...Yuhang Zang Pan Zhang Jiaqi Wang Dahua Lin Kai Chen |