
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Papers citing "VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos"
50 / 52 papers shown
Title |
---|
![]() GLaMM: Pixel Grounding Large Multimodal Model H. Rasheed Muhammad Maaz Sahal Shaji Mullappilly Abdelrahman M. Shaker Salman Khan Hisham Cholakkal Rao M. Anwer Erix Xing Ming-Hsuan Yang Fahad S. Khan |