Title |
---|
![]() MIO: A Foundation Model on Multimodal Tokens Zekun Wang King Zhu Chunpu Xu Wangchunshu Zhou Jiaheng Liu ...Yuanxing Zhang Ge Zhang Ke Xu Jie Fu Wenhao Huang |
![]() CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Tsung-Han Wu Joseph E. Gonzalez Trevor Darrell David M. Chan |
![]() Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous
Driving Kairui Ding Boyuan Chen Yuchen Su Huan-ang Gao Bu Jin ...Wuqiang Zhang Xiaohui Li Paul Barsch Hongyang Li Hao Zhao |
![]() LLaVA-Chef: A Multi-modal Generative Model for Food Recipes Fnu Mohbat Mohammed J. Zaki |