ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.10295
19
0

Object Segmentation with Audio Context

4 January 2023
Kaihui Zheng
Yuqing Ren
Zixin Shen
Tianxu Qin
    VOS
ArXivPDFHTML
Abstract

Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bearing video recordings. For this project, we explore the multimodal feature aggregation for video instance segmentation task, in which we integrate audio features into our video segmentation model to conduct an audio-visual learning scheme. Our method is based on existing video instance segmentation method which leverages rich contextual information across video frames. Since this is the first attempt to investigate the audio-visual instance segmentation, a novel dataset, including 20 vocal classes with synchronized video and audio recordings, is collected. By utilizing combined decoder to fuse both video and audio features, our model shows a slight improvements compared to the base model. Additionally, we managed to show the effectiveness of different modules by conducting extensive ablations.

View on arXiv
Comments on this paper