Title |
---|
![]() MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA Hanrong Ye Haotian Zhang Erik Daxberger Lin Chen Zongyu Lin ...Haoxuan You Dan Xu Zhe Gan Jiasen Lu Yinfei Yang |
![]() TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings Dawei Yan Pengcheng Li Yang Li Hao Chen Qingguo Chen Weihua Luo Wei Dong Qingsen Yan Haokui Zhang Chunhua Shen |
![]() VITA: Towards Open-Source Interactive Omni Multimodal LLM Chaoyou Fu Haojia Lin Zuwei Long Yunhang Shen Meng Zhao ...Ran He Rongrong Ji Yunsheng Wu Caifeng Shan Xing Sun |