ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.12511
40
0

Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis

16 April 2025
Shravan Chaudhari
Trilokya Akula
Yoon Kim
Tom Blake
    LRM
ArXivPDFHTML
Abstract

In this paper, we advance the study of AI-augmented reasoning in the context of Human-Computer Interaction (HCI), psychology and cognitive science, focusing on the critical task of visual perception. Specifically, we investigate the applicability of Multimodal Large Language Models (MLLMs) in this domain. To this end, we leverage established principles and explanations from psychology and cognitive science related to complexity in human visual perception. We use them as guiding principles for the MLLMs to compare and interprete visual content. Our study aims to benchmark MLLMs across various explainability principles relevant to visual perception. Unlike recent approaches that primarily employ advanced deep learning models to predict complexity metrics from visual content, our work does not seek to develop a mere new predictive model. Instead, we propose a novel annotation-free analytical framework to assess utility of MLLMs as cognitive assistants for HCI tasks, using visual perception as a case study. The primary goal is to pave the way for principled study in quantifying and evaluating the interpretability of MLLMs for applications in improving human reasoning capability and uncovering biases in existing perception datasets annotated by humans.

View on arXiv
@article{chaudhari2025_2504.12511,
  title={ Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis },
  author={ Shravan Chaudhari and Trilokya Akula and Yoon Kim and Tom Blake },
  journal={arXiv preprint arXiv:2504.12511},
  year={ 2025 }
}
Comments on this paper