ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.01417
34
1

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs

2 October 2024
Hong Li
Nanxi Li
Yuanjie Chen
Jianbin Zhu
Qinlu Guo
Cewu Lu
Yong-Lu Li
    MLLM
ArXivPDFHTML
Abstract

Multi-modal Large Language Models (MLLMs) have exhibited impressive capability. However, recently many deficiencies of MLLMs have been found compared to human intelligence, e.g.\textit{e.g.}e.g., hallucination. To drive the MLLMs study, the community dedicated efforts to building larger benchmarks with complex tasks. In this paper, we propose benchmarking an essential but usually overlooked intelligence: association\textbf{association}association, a human's basic capability to link observation and prior practice memory. To comprehensively investigate MLLM's performance on the association, we formulate the association task and devise a standard benchmark based on adjective and verb semantic concepts. Instead of costly data annotation and curation, we propose a convenient annotation-free\textbf{annotation-free}annotation-free construction method transforming the general dataset for our association tasks. Simultaneously, we devise a rigorous data refinement process to eliminate confusion in the raw dataset. Building on this database, we establish three levels of association tasks: single-step, synchronous, and asynchronous associations. Moreover, we conduct a comprehensive investigation into the MLLMs' zero-shot association capabilities, addressing multiple dimensions, including three distinct memory strategies, both open-source and closed-source MLLMs, cutting-edge Mixture-of-Experts (MoE) models, and the involvement of human experts. Our systematic investigation shows that current open-source MLLMs consistently exhibit poor capability in our association tasks, even the currently state-of-the-art GPT-4V(vision) also has a significant gap compared to humans. We believe our benchmark would pave the way for future MLLM studies. Our data and code are available at:\textit{Our data and code are available at:}Our data and code are available at:this https URL.

View on arXiv
@article{li2025_2410.01417,
  title={ The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs },
  author={ Hong Li and Nanxi Li and Yuanjie Chen and Jianbin Zhu and Qinlu Guo and Cewu Lu and Yong-Lu Li },
  journal={arXiv preprint arXiv:2410.01417},
  year={ 2025 }
}
Comments on this paper