ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.04923
  4. Cited By
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

7 November 2024
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
Fahad Shahbaz Khan
Salman Khan
    MLLM
    VGen
    VLM
ArXivPDFHTML

Papers citing "VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos"

2 / 52 papers shown
Title
Video Object Segmentation with Language Referring Expressions
Video Object Segmentation with Language Referring Expressions
Anna Khoreva
Anna Rohrbach
Bernt Schiele
VOS
53
194
0
21 Mar 2018
COCO-Stuff: Thing and Stuff Classes in Context
COCO-Stuff: Thing and Stuff Classes in Context
Holger Caesar
J. Uijlings
V. Ferrari
116
1,384
0
12 Dec 2016
Previous
12