Title |
---|
![]() Revisit Large-Scale Image-Caption Data in Pre-training Multimodal
Foundation Models Zhengfeng Lai Vasileios Saveris Chen Chen Hong-You Chen Haotian Zhang ...Wenze Hu Zhe Gan Peter Grasch Meng Cao Yinfei Yang |
![]() MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang Mingfei Gao Zhe Gan Philipp Dufter Nina Wenzel ...Haoxuan You Zirui Wang Afshin Dehghan Peter Grasch Yinfei Yang |
![]() Emu3: Next-Token Prediction is All You Need Xinlong Wang Xiaosong Zhang Zhengxiong Luo Quan-Sen Sun Yufeng Cui ...Xi Yang Jingjing Liu Yonghua Lin Tiejun Huang Zhongyuan Wang |
![]() Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models Matt Deitke Christopher Clark Sangho Lee Rohun Tripathi Yue Yang ...Noah A. Smith Hannaneh Hajishirzi Ross Girshick Ali Farhadi Aniruddha Kembhavi |