Title |
---|
![]() MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang Mingfei Gao Zhe Gan Philipp Dufter Nina Wenzel ...Haoxuan You Zirui Wang Afshin Dehghan Peter Grasch Yinfei Yang |
![]() MIO: A Foundation Model on Multimodal Tokens Zekun Wang King Zhu Chunpu Xu Wangchunshu Zhou Jiaheng Liu ...Yuanxing Zhang Ge Zhang Ke Xu Jie Fu Wenhao Huang |
![]() MMSearch: Benchmarking the Potential of Large Models as Multi-modal
Search Engines Dongzhi Jiang Renrui Zhang Ziyu Guo Yanmin Wu Jiayi Lei ...Guanglu Song Peng Gao Yu Liu Chunyuan Li Hongsheng Li |
![]() VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths
Vision Computation |
![]() VITA: Towards Open-Source Interactive Omni Multimodal LLM Chaoyou Fu Haojia Lin Zuwei Long Yunhang Shen Meng Zhao ...Ran He Rongrong Ji Yunsheng Wu Caifeng Shan Xing Sun |
![]() MAVIS: Mathematical Visual Instruction Tuning Renrui Zhang Xinyu Wei Dongzhi Jiang Yichi Zhang Ziyu Guo ...Aojun Zhou Bin Wei Shanghang Zhang Peng Gao Hongsheng Li |