Title |
---|
![]() Rephrase, Augment, Reason: Visual Grounding of Questions for
Vision-Language Models Archiki Prasad Elias Stengel-Eskin Mohit Bansal |
![]() AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Avamarie Brueggeman Andrea Madotto Zhaojiang Lin Tushar Nagarajan Matt Smith ...Peyman Heidari Yue Liu Kavya Srinet Babak Damavandi Anuj Kumar |
![]() Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level
Vision Haoning Wu Zicheng Zhang Erli Zhang Chaofeng Chen Liang Liao ...Chunyi Li Wenxiu Sun Qiong Yan Guangtao Zhai Weisi Lin |
![]() DreamLLM: Synergistic Multimodal Comprehension and Creation Runpei Dong Chunrui Han Yuang Peng Zekun Qi Zheng Ge ...Hao-Ran Wei Xiangwen Kong Xiangyu Zhang Kaisheng Ma Li Yi |