ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.03194
15
0

HueManity: Probing Fine-Grained Visual Perception in MLLMs

31 May 2025
Rynaa Grover
Jayant Sravan Tamarapalli
Sahiti Yerramilli
Nilay Pande
    VLM
ArXiv (abs)PDFHTML
Main:7 Pages
3 Figures
Bibliography:3 Pages
3 Tables
Appendix:3 Pages
Abstract

Multimodal Large Language Models (MLLMs) excel at high-level visual reasoning, but their performance on nuanced perceptual tasks remains surprisingly limited. We present HueManity, a benchmark designed to assess visual perception in MLLMs. The dataset comprises 83,850 images featuring two-character alphanumeric strings embedded in Ishihara test style dot patterns, challenging models on precise pattern recognition. Our evaluation of nine state-of-the-art MLLMs on HueManity demonstrates a significant performance deficit compared to human and traditional computer vision baselines. The best-performing MLLM achieved a 33.6% accuracy on the numeric `easy' task and a striking 3% on the alphanumeric `hard' task. In contrast, human participants achieved near-perfect scores (100% and 95.6%), and a fine-tuned ResNet50 model reached accuracies of 96.5% and 94.5%. These results highlight a critical gap in the visual capabilities of current MLLMs. Our analysis further explores potential architectural and training-paradigm factors contributing to this perceptual gap in MLLMs. We open-source HueManity dataset and code to foster further research in improving perceptual robustness of MLLMs.

View on arXiv
@article{grover2025_2506.03194,
  title={ HueManity: Probing Fine-Grained Visual Perception in MLLMs },
  author={ Rynaa Grover and Jayant Sravan Tamarapalli and Sahiti Yerramilli and Nilay Pande },
  journal={arXiv preprint arXiv:2506.03194},
  year={ 2025 }
}
Comments on this paper