ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.16640
  4. Cited By
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective

A Survey of Multimodal Large Language Model from A Data-centric Perspective

26 May 2024
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
Shiyu Li
Ling Yang
Bozhou Li
Yifan Wang
Bin Cui
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
ArXivPDFHTML

Papers citing "A Survey of Multimodal Large Language Model from A Data-centric Perspective"

50 / 51 papers shown
Title
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
39
0
0
09 Apr 2025
Multimodal Agricultural Agent Architecture (MA3): A New Paradigm for Intelligent Agricultural Decision-Making
Multimodal Agricultural Agent Architecture (MA3): A New Paradigm for Intelligent Agricultural Decision-Making
Zhuoning Xu
Jian Xu
M. Zhang
P. Wang
Chao Deng
Cheng-Lin Liu
26
0
0
07 Apr 2025
Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Xiaomin Yu
Pengxiang Ding
Wenjie Zhang
Siteng Huang
Songyang Gao
Chengwei Qin
Kejian Wu
Zhaoxin Fan
Ziyue Qiao
Donglin Wang
MLLM
SyDa
72
0
0
28 Mar 2025
Do Multimodal Large Language Models Understand Welding?
Do Multimodal Large Language Models Understand Welding?
Grigorii Khvatskii
Yong Suk Lee
Corey Angst
Maria Gibbs
Robert Landers
Nitesh V. Chawla
AI4CE
49
1
0
18 Mar 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
Jiashi Li
Xiang Yue
Bo Li
Ping Nie
Kai Zou
Wenhu Chen
LRM
79
2
0
13 Mar 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
P. Wang
Zhongzhi Li
Fei Yin
Dekang Ran
Chenglin Liu
Cheng-Lin Liu
LRM
50
3
0
28 Feb 2025
MathClean: A Benchmark for Synthetic Mathematical Data Cleaning
MathClean: A Benchmark for Synthetic Mathematical Data Cleaning
Hao Liang
Meiyi Qiang
Heng Chang
Zefeng He
Yongzhen Guo
Z. Zhu
Wentao Zhang
Bin Cui
39
0
0
26 Feb 2025
MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models
MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models
Zhen Zhang
Yuqing Yang
Kai Zhen
Nathan Susanj
Athanasios Mouchtaris
Siegfried Kunzmann
Zheng Zhang
54
0
0
17 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
127
9
0
05 Feb 2025
A Review of Multimodal Explainable Artificial Intelligence: Past,
  Present and Future
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Shilin Sun
Wenbin An
Feng Tian
Fang Nan
Qidong Liu
Xiaozhong Liu
N. Shah
Ping Chen
96
2
0
18 Dec 2024
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Ruichuan An
Sihan Yang
Ming Lu
Kai Zeng
Yulin Luo
...
Hao Liang
Qi She
Shanghang Zhang
W. Zhang
Wentao Zhang
90
5
0
18 Nov 2024
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation
Hao Liang
Zirong Chen
W. Zhang
Wentao Zhang
36
1
0
11 Nov 2024
Sample then Identify: A General Framework for Risk Control and
  Assessment in Multimodal Large Language Models
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Qingni Wang
Tiantian Geng
Zhiyuan Wang
Teng Wang
Bo Fu
Feng Zheng
30
5
0
10 Oct 2024
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered
  Knowledge in Large Language Models
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models
Bozhou Li
Hao Liang
Yang Li
Fangcheng Fu
Hongzhi Yin
Conghui He
Wentao Zhang
KELM
CLL
48
0
0
08 Oct 2024
Data Proportion Detection for Optimized Data Management for Large
  Language Models
Data Proportion Detection for Optimized Data Management for Large Language Models
Hao Liang
Keshi Zhao
Yajie Yang
Bin Cui
Guosheng Dong
Zenan Zhou
Wentao Zhang
33
0
0
26 Sep 2024
Surveying the MLLM Landscape: A Meta-Review of Current Surveys
Surveying the MLLM Landscape: A Meta-Review of Current Surveys
Ming Li
Keyu Chen
Ziqian Bi
Ming Liu
Benji Peng
...
Jinlang Wang
Sen Zhang
X. Pan
Jiawei Xu
Pohsun Feng
OffRL
42
2
0
17 Sep 2024
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and
  Large Language Models
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models
Fatma Yasmine Loumachi
Mohamed Chahine Ghanem
AI4CE
40
0
0
04 Sep 2024
MathScape: Evaluating MLLMs in multimodal Math Scenarios through a
  Hierarchical Benchmark
MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark
Minxuan Zhou
Hao Liang
Tianpeng Li
Zhiyu Wu
Mingan Lin
...
Yujing Qiao
Weipeng Chen
Bin Cui
Wentao Zhang
Zenan Zhou
41
4
0
14 Aug 2024
Are Bigger Encoders Always Better in Vision Large Models?
Are Bigger Encoders Always Better in Vision Large Models?
Bozhou Li
Hao Liang
Zimo Meng
Wentao Zhang
VLM
40
3
0
01 Aug 2024
Synth-Empathy: Towards High-Quality Synthetic Empathy Data
Synth-Empathy: Towards High-Quality Synthetic Empathy Data
Hao Liang
Linzhuang Sun
Jingxuan Wei
Xijie Huang
Linkun Sun
Bihui Yu
Conghui He
Wentao Zhang
SyDa
42
4
0
31 Jul 2024
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision
  Language Models
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models
Zheng Liu
Hao Liang
Xijie Huang
Wentao Xiong
Qinhan Yu
Linzhuang Sun
Chong Chen
Conghui He
Bin Cui
Wentao Zhang
SyDa
49
0
0
30 Jul 2024
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Daoyuan Chen
Haibin Wang
Yilun Huang
Ce Ge
Yaliang Li
Bolin Ding
Jingren Zhou
VLM
SyDa
63
0
0
16 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
57
5
0
11 Jul 2024
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System
Miao Zheng
H. Liang
Fan Yang
Haoze Sun
Tianpeng Li
...
Kun Fang
Weipeng Chen
Bin Cui
Wentao Zhang
Zenan Zhou
RALM
42
3
0
08 Jul 2024
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for
  Text-to-Image Generation?
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen
Yichao Du
Zichen Wen
Yiyang Zhou
Chenhang Cui
...
Jiawei Zhou
Zhuokai Zhao
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
EGVM
MLLM
58
29
0
05 Jul 2024
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
Hao Liang
Jiapeng Li
Tianyi Bai
Xijie Huang
Linzhuang Sun
Zhengren Wang
Conghui He
Bin Cui
Chong Chen
Wentao Zhang
VGen
29
7
0
03 Jul 2024
Efficient-Empathy: Towards Efficient and Effective Selection of Empathy
  Data
Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data
Linzhuang Sun
Hao Liang
Jingxuan Wei
Linkun Sun
Bihui Yu
Bin Cui
Wentao Zhang
32
1
0
02 Jul 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
86
178
0
29 Feb 2024
SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
Liangxin Liu
Xuebo Liu
Derek F. Wong
Dongfang Li
Ziyi Wang
Baotian Hu
Min Zhang
47
16
0
26 Feb 2024
LESS: Selecting Influential Data for Targeted Instruction Tuning
LESS: Selecting Influential Data for Targeted Instruction Tuning
Mengzhou Xia
Sadhika Malladi
Suchin Gururangan
Sanjeev Arora
Danqi Chen
80
186
0
06 Feb 2024
SelectLLM: Can LLMs Select Important Instructions to Annotate?
SelectLLM: Can LLMs Select Important Instructions to Annotate?
Long Lei
Jaehyung Kim
Yueming Jin
Dongyeop Kang
SyDa
37
10
0
29 Jan 2024
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
  Datasets
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
158
1,016
0
25 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
194
591
0
16 Nov 2023
Large Language Models for Robotics: A Survey
Large Language Models for Robotics: A Survey
Fanlong Zeng
Wensheng Gan
Yongheng Wang
Ning Liu
Philip S. Yu
LM&Ro
124
125
0
13 Nov 2023
Active Instruction Tuning: Improving Cross-Task Generalization by
  Training on Prompt Sensitive Tasks
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks
Po-Nien Kung
Fan Yin
Di Wu
Kai-Wei Chang
Nanyun Peng
77
40
0
01 Nov 2023
Constructing Image-Text Pair Dataset from Books
Constructing Image-Text Pair Dataset from Books
Yamato Okamoto
Haruto Toyonaga
Yoshihisa Ijiri
Hirokatsu Kataoka
55
2
0
03 Oct 2023
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low
  Training Data Instruction Tuning
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning
Haowen Chen
Yiming Zhang
Qi Zhang
Hantao Yang
Xiaomeng Hu
Xuetao Ma
Yifan YangGong
J. Zhao
ALM
69
47
0
16 May 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
208
900
0
27 Apr 2023
The Vendi Score: A Diversity Evaluation Metric for Machine Learning
The Vendi Score: A Diversity Evaluation Metric for Machine Learning
Dan Friedman
Adji Bousso Dieng
EGVM
94
109
0
05 Oct 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
211
1,106
0
20 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
242
593
0
14 Jul 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual
  Machine Learning
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
208
310
0
02 Mar 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
290
1,084
0
17 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
256
1,996
0
31 Dec 2020
Coherent Hierarchical Multi-Label Classification Networks
Coherent Hierarchical Multi-Label Classification Networks
Eleonora Giunchiglia
Thomas Lukasiewicz
AILaw
37
96
0
20 Oct 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
246
4,489
0
23 Jan 2020
Joint 2D-3D-Semantic Data for Indoor Scene Understanding
Joint 2D-3D-Semantic Data for Indoor Scene Understanding
Iro Armeni
S. Sax
Amir Zamir
Silvio Savarese
3DV
3DPC
115
876
0
03 Feb 2017
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in
  Natural Images
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
188
515
0
26 Jan 2016
12
Next