ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.03589
58
0
v1v2 (latest)

BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance

4 June 2025
Huy Le
Nhat Chung
Tung Kieu
A. Nguyen
Ngan Le
ArXiv (abs)PDFHTML
Main:9 Pages
17 Figures
Bibliography:3 Pages
10 Tables
Appendix:10 Pages
Abstract

Text-video retrieval (TVR) systems often suffer from visual-linguistic biases present in datasets, which cause pre-trained vision-language models to overlook key details. To address this, we propose BiMa, a novel framework designed to mitigate biases in both visual and textual representations. Our approach begins by generating scene elements that characterize each video by identifying relevant entities/objects and activities. For visual debiasing, we integrate these scene elements into the video embeddings, enhancing them to emphasize fine-grained and salient details. For textual debiasing, we introduce a mechanism to disentangle text features into content and bias components, enabling the model to focus on meaningful content while separately handling biased information. Extensive experiments and ablation studies across five major TVR benchmarks (i.e., MSR-VTT, MSVD, LSMDC, ActivityNet, and DiDeMo) demonstrate the competitive performance of BiMa. Additionally, the model's bias mitigation capability is consistently validated by its strong results on out-of-distribution retrieval tasks.

View on arXiv
@article{le2025_2506.03589,
  title={ BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance },
  author={ Huy Le and Nhat Chung and Tung Kieu and Anh Nguyen and Ngan Le },
  journal={arXiv preprint arXiv:2506.03589},
  year={ 2025 }
}
Comments on this paper