ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2511.18399
52
0

ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering

23 November 2025
Yuxiang Nie
Han Wang
Yongjie Ye
Haiyang Yu
Weitao Jia
Tao Zeng
Hao Feng
Xiang Fei
Yang Li
Xiaohui Lv
Guozhi Tang
J. Tang
Jinghui Lu
Zehui Dai
Jiacong Wang
Dingkang Yang
An-Lan Wang
Can Huang
    ELM
ArXiv (abs)PDFHTML
Main:9 Pages
8 Figures
Bibliography:3 Pages
3 Tables
Appendix:2 Pages
Abstract

This paper introduces ChineseVideoBench, a pioneering benchmark specifically designed for evaluating Multimodal Large Language Models (MLLMs) in Chinese Video Question Answering. The growing demand for sophisticated video analysis capabilities highlights the critical need for comprehensive, culturally-aware evaluation frameworks. ChineseVideoBench addresses this gap by providing a robust dataset and tailored evaluation metrics, enabling rigorous assessment of state-of-the-art MLLMs on complex Chinese video content. Specifically, ChineseVideoBench comprises 8 main classes and 12 sub-classes, encompassing tasks that demand both deep video understanding and nuanced Chinese linguistic and cultural awareness. Our empirical evaluations reveal that ChineseVideoBench presents a significant challenge to current MLLMs. Among the models assessed, Gemini 2.5 Pro achieves the highest performance with an overall score of 77.9%, while InternVL-38B emerges as the most competitive open-source model.

View on arXiv
Comments on this paper