ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.04549
91
0
v1v2v3 (latest)

MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning

6 August 2025
Quang-Trung Truong
Yuk-Kwan Wong
Vo Hoang Kim Tuyen Dang
Rinaldi Gotama
D. Nguyen
Sai-Kit Yeung
    VOS
ArXiv (abs)PDFHTML
Main:6 Pages
8 Figures
Bibliography:2 Pages
5 Tables
Abstract

Marine videos present significant challenges for video understanding due to the dynamics of marine objects and the surrounding environment, camera motion, and the complexity of underwater scenes. Existing video captioning datasets, typically focused on generic or human-centric domains, often fail to generalize to the complexities of the marine environment and gain insights about marine life. To address these limitations, we propose a two-stage marine object-oriented video captioning pipeline. We introduce a comprehensive video understanding benchmark that leverages the triplets of video, text, and segmentation masks to facilitate visual grounding and captioning, leading to improved marine video understanding and analysis, and marine video generation. Additionally, we highlight the effectiveness of video splitting in order to detect salient object transitions in scene changes, which significantly enrich the semantics of captioning content. Our dataset and code have been released atthis https URL.

View on arXiv
Comments on this paper