Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.02740
Cited By
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
3 October 2024
Zhengfeng Lai
Vasileios Saveris
C. L. P. Chen
Hong-You Chen
Haotian Zhang
Bowen Zhang
Juan Lao Tebar
Wenze Hu
Zhe Gan
Peter Grasch
Meng Cao
Yinfei Yang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models"
2 / 2 papers shown
Title
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
S. Joshi
Besmira Nushi
Vidhisha Balachandran
Varun Chandrasekaran
Vibhav Vineet
Neel Joshi
Baharan Mirzasoleiman
MLLM
VLM
49
0
0
07 Jan 2025
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
40
32
1
30 Sep 2024
1