Title |
---|
![]() EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Yuxuan Zhang Tianheng Cheng Lianghui Zhu Lei Liu Heng Liu Longjin Ran Xiaoxin Chen Xiaoxin Chen Wenyu Liu Xinggang Wang |
![]() Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report Franz Louis Cesista |
![]() What If We Recaption Billions of Web Images with LLaMA-3? Xianhang Li Haoqin Tu Mude Hui Zeyu Wang Bingchen Zhao ...Jieru Mei Qing Liu Huangjie Zheng Yuyin Zhou Cihang Xie |