Title |
---|
![]() Merlin:Empowering Multimodal LLMs with Foresight Minds En Yu Liang Zhao Yana Wei Jinrong Yang Dongming Wu ...Haoran Wei Tiancai Wang Zheng Ge Xiangyu Zhang Wenbing Tao |
![]() Rephrase, Augment, Reason: Visual Grounding of Questions for
Vision-Language Models Archiki Prasad Elias Stengel-Eskin Mohit Bansal |
![]() AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Avamarie Brueggeman Andrea Madotto Zhaojiang Lin Tushar Nagarajan Matt Smith ...Peyman Heidari Yue Liu Kavya Srinet Babak Damavandi Anuj Kumar |
![]() DreamLLM: Synergistic Multimodal Comprehension and Creation Runpei Dong Chunrui Han Yuang Peng Zekun Qi Zheng Ge ...Hao-Ran Wei Xiangwen Kong Xiangyu Zhang Kaisheng Ma Li Yi |