ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.14504
156
2

Aligning Multimodal LLM with Human Preference: A Survey

18 March 2025
Tao Yu
Y. Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
Kun Wang
Xingyu Lu
Yunhang Shen
Guibin Zhang
D. Song
Yibo Yan
Tianlong Xu
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
T. Tan
ArXivPDFHTML
Abstract

Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms, each targeting different application scenarios and optimization goals. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of alignment algorithms for MLLMs. Specifically, we explore four key aspects: (1) the application scenarios covered by alignment algorithms, including general image understanding, multi-image, video, and audio, and extended multimodal applications; (2) the core factors in constructing alignment datasets, including data sources, model responses, and preference annotations; (3) the benchmarks used to evaluate alignment algorithms; and (4) a discussion of potential future directions for the development of alignment algorithms. This work seeks to help researchers organize current advancements in the field and inspire better alignment methods. The project page of this paper is available atthis https URL.

View on arXiv
@article{yu2025_2503.14504,
  title={ Aligning Multimodal LLM with Human Preference: A Survey },
  author={ Tao Yu and Yi-Fan Zhang and Chaoyou Fu and Junkang Wu and Jinda Lu and Kun Wang and Xingyu Lu and Yunhang Shen and Guibin Zhang and Dingjie Song and Yibo Yan and Tianlong Xu and Qingsong Wen and Zhang Zhang and Yan Huang and Liang Wang and Tieniu Tan },
  journal={arXiv preprint arXiv:2503.14504},
  year={ 2025 }
}
Comments on this paper