ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.12336
28
0

Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding

14 June 2025
Youze Wang
Zijun Chen
Ruoyu Chen
Shishen Gu
Yinpeng Dong
Hang Su
Jun Zhu
Meng Wang
Richang Hong
Wenbo Hu
ArXiv (abs)PDFHTML
Main:13 Pages
62 Figures
Bibliography:2 Pages
22 Tables
Appendix:34 Pages
Abstract

Recent advancements in multimodal large language models for video understanding (videoLLMs) have improved their ability to process dynamic multimodal data. However, trustworthiness challenges factual inaccuracies, harmful content, biases, hallucinations, and privacy risks, undermine reliability due to video data's spatiotemporal complexities. This study introduces Trust-videoLLMs, a comprehensive benchmark evaluating videoLLMs across five dimensions: truthfulness, safety, robustness, fairness, and privacy. Comprising 30 tasks with adapted, synthetic, and annotated videos, the framework assesses dynamic visual scenarios, cross-modal interactions, and real-world safety concerns. Our evaluation of 23 state-of-the-art videoLLMs (5 commercial,18 open-source) reveals significant limitations in dynamic visual scene understanding and cross-modal perturbation resilience. Open-source videoLLMs show occasional truthfulness advantages but inferior overall credibility compared to commercial models, with data diversity outperforming scale effects. These findings highlight the need for advanced safety alignment to enhance capabilities. Trust-videoLLMs provides a publicly available, extensible toolbox for standardized trustworthiness assessments, bridging the gap between accuracy-focused benchmarks and critical demands for robustness, safety, fairness, and privacy.

View on arXiv
@article{wang2025_2506.12336,
  title={ Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding },
  author={ Youze Wang and Zijun Chen and Ruoyu Chen and Shishen Gu and Yinpeng Dong and Hang Su and Jun Zhu and Meng Wang and Richang Hong and Wenbo Hu },
  journal={arXiv preprint arXiv:2506.12336},
  year={ 2025 }
}
Comments on this paper