VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

7 November 2024

Papers citing "VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos"

2 / 52 papers shown

Title
Video Object Segmentation with Language Referring Expressions Anna Khoreva Anna Rohrbach Bernt Schiele VOS 53 194 0 21 Mar 2018
COCO-Stuff: Thing and Stuff Classes in Context Holger Caesar J. Uijlings V. Ferrari 116 1,384 0 12 Dec 2016