ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.16435
47
0

VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models

23 February 2025
Jen-Tse Huang
Dasen Dai
Jen-Yuan Huang
Youliang Yuan
Xiaoyuan Liu
Wenxuan Wang
Wenxiang Jiao
Pinjia He
Zhaopeng Tu
    LRM
ArXivPDFHTML
Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable advancements in multimodal understanding; however, their fundamental visual cognitive abilities remain largely underexplored. To bridge this gap, we introduce VisFactor, a novel benchmark derived from the Factor-Referenced Cognitive Test (FRCT), a well-established psychometric assessment of human cognition. VisFactor digitalizes vision-related FRCT subtests to systematically evaluate MLLMs across essential visual cognitive tasks including spatial reasoning, perceptual speed, and pattern recognition. We present a comprehensive evaluation of state-of-the-art MLLMs, such as GPT-4o, Gemini-Pro, and Qwen-VL, using VisFactor under diverse prompting strategies like Chain-of-Thought and Multi-Agent Debate. Our findings reveal a concerning deficiency in current MLLMs' fundamental visual cognition, with performance frequently approaching random guessing and showing only marginal improvements even with advanced prompting techniques. These results underscore the critical need for focused research to enhance the core visual reasoning capabilities of MLLMs. To foster further investigation in this area, we release our VisFactor benchmark atthis https URL.

View on arXiv
@article{huang2025_2502.16435,
  title={ VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models },
  author={ Jen-Tse Huang and Dasen Dai and Jen-Yuan Huang and Youliang Yuan and Xiaoyuan Liu and Wenxuan Wang and Wenxiang Jiao and Pinjia He and Zhaopeng Tu },
  journal={arXiv preprint arXiv:2502.16435},
  year={ 2025 }
}
Comments on this paper