Title |
---|
![]() Revisit Large-Scale Image-Caption Data in Pre-training Multimodal
Foundation Models Zhengfeng Lai Vasileios Saveris C. L. P. Chen Hong-You Chen Haotian Zhang ...Wenze Hu Zhe Gan Peter Grasch Meng Cao Yinfei Yang |
![]() What If We Recaption Billions of Web Images with LLaMA-3? Xianhang Li Haoqin Tu Mude Hui Zeyu Wang Bingchen Zhao ...Jieru Mei Qing Liu Huangjie Zheng Yuyin Zhou Cihang Xie |