Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

1 August 2024

Papers citing "Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model"

2 / 2 papers shown

Title
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding Yue Fan Xiaojian Ma Rujie Wu Yuntao Du Jiaqi Li Zhi Gao Qing Li VLM LLMAG 51 59 0 18 Mar 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 322 4,300 0 30 Jan 2023