Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.08583
Cited By
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
11 July 2024
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective"
46 / 46 papers shown
Title
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
VLM
32
5
0
08 Aug 2024
Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling
Cong Xu
Gayathri Saranathan
Mahammad Parwez Alam
Arpit Shah
James Lim
Soon Yee Wong
Foltin Martin
Suparna Bhattacharya
VLM
35
3
0
21 Jun 2024
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
Lin Long
Rui Wang
Ruixuan Xiao
Junbo Zhao
Xiao Ding
Gang Chen
Haobo Wang
SyDa
53
91
0
14 Jun 2024
Multimodal Reasoning with Multimodal Knowledge Graph
Junlin Lee
Yequan Wang
Jing Li
Min Zhang
29
15
0
04 Jun 2024
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
41
45
0
17 May 2024
Language-Image Models with 3D Understanding
Jang Hyun Cho
B. Ivanovic
Yulong Cao
Edward Schmerling
Yue Wang
...
Boyi Li
Yurong You
Philipp Krahenbuhl
Yan Wang
Marco Pavone
LRM
42
16
0
06 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
95
139
0
29 Apr 2024
What Makes Multimodal In-Context Learning Work?
Folco Bertini Baldassini
Mustafa Shukor
Matthieu Cord
Laure Soulier
Benjamin Piwowarski
37
18
0
24 Apr 2024
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
Xun Wu
Shaohan Huang
Furu Wei
42
8
0
23 Apr 2024
How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?
Yang Luo
Zangwei Zheng
Zirui Zhu
Yang You
41
5
0
19 Apr 2024
MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Xiaotang Gai
Chenyi Zhou
Jiaxiang Liu
Yang Feng
Jian Wu
Zuo-Qiang Liu
MedIm
36
6
0
18 Apr 2024
Aligning Actions and Walking to LLM-Generated Textual Descriptions
Radu Chivereanu
Adrian Cosma
Andy Catruna
R. Rughinis
I. Radoi
49
2
0
18 Apr 2024
Fewer Truncations Improve Language Modeling
Hantian Ding
Zijian Wang
Giovanni Paolini
Varun Kumar
Anoop Deoras
Dan Roth
Stefano Soatto
56
13
0
16 Apr 2024
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
Yipo Huang
Xiangfei Sheng
Zhichao Yang
Quan Yuan
Zhichao Duan
Pengfei Chen
Leida Li
Weisi Lin
Guangming Shi
34
23
0
15 Apr 2024
Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction
Bowen Zhang
Harold Soh
32
16
0
05 Apr 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao
Ameya Prabhu
Adhiraj Ghosh
Yash Sharma
Philip H. S. Torr
Adel Bibi
Samuel Albanie
Matthias Bethge
VLM
126
44
0
04 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
63
33
0
29 Mar 2024
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
54
12
0
20 Mar 2024
Towards Multimodal In-Context Learning for Vision & Language Models
Sivan Doveh
Shaked Perek
M. Jehanzeb Mirza
Wei Lin
Amit Alfassy
Assaf Arbelle
S. Ullman
Leonid Karlinsky
VLM
110
14
0
19 Mar 2024
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
Xueqing Wu
Rui Zheng
Jingzhen Sha
Te-Lin Wu
Hanyu Zhou
Mohan Tang
Kai-Wei Chang
Nanyun Peng
Haoran Huang
52
1
0
04 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
79
177
0
29 Feb 2024
All in an Aggregated Image for In-Image Learning
Lei Wang
Wanyu Xu
Zhiqiang Hu
Yihuai Lan
Shan Dong
Hao Wang
Roy Ka-Wei Lee
Ee-Peng Lim
VLM
43
1
0
28 Feb 2024
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative
Zhen Tan
Chengshuai Zhao
Raha Moraffah
Yifan Li
Yu Kong
Tianlong Chen
Huan Liu
36
15
0
20 Feb 2024
On the Convergence of Zeroth-Order Federated Tuning for Large Language Models
Zhenqing Ling
Daoyuan Chen
Liuyi Yao
Yaliang Li
Ying Shen
FedML
45
12
0
08 Feb 2024
A Survey on Safe Multi-Modal Learning System
Tianyi Zhao
Liangliang Zhang
Yao Ma
Lu Cheng
52
9
0
08 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
128
107
0
08 Feb 2024
When Large Language Models Meet Vector Databases: A Survey
Zhi Jing
Yongye Su
Yikun Han
Bo Yuan
Haiyun Xu
Chunjiang Liu
Kehai Chen
Min Zhang
53
35
0
30 Jan 2024
Red Teaming Visual Language Models
Mukai Li
Lei Li
Yuwei Yin
Masood Ahmed
Zhenguang Liu
Qi Liu
VLM
28
30
0
23 Jan 2024
Generative Multi-Modal Knowledge Retrieval with Large Language Models
Xinwei Long
Jiali Zeng
Fandong Meng
Zhiyuan Ma
Kaiyan Zhang
Bowen Zhou
Jie Zhou
37
15
0
16 Jan 2024
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Mengwei Xu
Wangsong Yin
Dongqi Cai
Rongjie Yi
Daliang Xu
...
Shangguang Wang
Yuanchun Li
Yunxin Liu
Xin Jin
Xuanzhe Liu
VLM
75
75
0
16 Jan 2024
Aligned with LLM: a new multi-modal training paradigm for encoding fMRI activity in visual cortex
Shuxiao Ma
Linyuan Wang
Senbao Hou
Bin Yan
MLLM
35
1
0
08 Jan 2024
Multimodal Data Curation via Object Detection and Filter Ensembles
Tzu-Heng Huang
Changho Shin
Sui Jiet Tay
Dyah Adila
Frederic Sala
34
5
0
05 Jan 2024
Silkie: Preference Distillation for Large Visual Language Models
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
Liang Chen
Yazheng Yang
Benyou Wang
Lingpeng Kong
MLLM
110
68
0
17 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
141
177
0
01 Dec 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
158
1,012
0
25 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
121
375
0
07 Nov 2023
LMDX: Language Model-based Document Information Extraction and Localization
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
48
29
0
19 Sep 2023
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
115
227
0
18 Sep 2023
On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann
Matthias Hein
AAML
107
85
0
21 Aug 2023
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Viorica Puatruaucean
Lucas Smaira
Ankush Gupta
Adrià Recasens Continente
L. Markeeva
...
Y. Aytar
Simon Osindero
Dima Damen
Andrew Zisserman
João Carreira
VLM
130
139
0
23 May 2023
ChatGPT as your Personal Data Scientist
Md. Mahadi Hassan
Alex Knipper
Shubhra (Santu) Karmaker
LM&MA
LLMAG
AI4CE
42
18
0
23 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,229
0
30 Jan 2023
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DV
CLIP
113
50
0
28 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
211
1,105
0
20 Sep 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
228
4,460
0
23 Jan 2020
1