ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.01319
46
30

A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

2 August 2024
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
Yi Pan
Mengyuan Liu
Peiran Gu
Sichen Xia
Wenjun Li
Yutong Zhang
Zihao Wu
Zheng Liu
Tianyang Zhong
Bao Ge
Tuo Zhang
Ning Qiang
Xintao Hu
Xi Jiang
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
    VLM
    AI4TS
ArXivPDFHTML
Abstract

In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of single-modality systems. In this paper, we systematically sort out the applications of MLLM in multimodal tasks such as natural language, vision, and audio. We also provide a comparative analysis of the focus of different MLLMs in the tasks, and provide insights into the shortcomings of current MLLMs, and suggest potential directions for future research. Through these discussions, this paper hopes to provide valuable insights for the further development and application of MLLM.

View on arXiv
Comments on this paper