ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.16640
47
36

A Survey of Multimodal Large Language Model from A Data-centric Perspective

26 May 2024
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
Shiyu Li
Ling Yang
Bozhou Li
Yifan Wang
Bin Cui
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
ArXivPDFHTML
Abstract

Multimodal large language models (MLLMs) enhance the capabilities of standard large language models by integrating and processing data from multiple modalities, including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Specifically, we explore methods for preparing multimodal data during the pretraining and adaptation phases of MLLMs. Additionally, we analyze the evaluation methods for the datasets and review the benchmarks for evaluating MLLMs. Our survey also outlines potential future research directions. This work aims to provide researchers with a detailed understanding of the data-driven aspects of MLLMs, fostering further exploration and innovation in this field.

View on arXiv
Comments on this paper