ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.04387
  4. Cited By
M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual
  Instruction Tuning
v1v2 (latest)

M3^33IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

7 June 2023
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
Shuhuai Ren
Mukai Li
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
    MLLMVLM
ArXiv (abs)PDFHTML

Papers citing "M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning"

24 / 24 papers shown
Title
Context-Informed Grounding Supervision
Context-Informed Grounding Supervision
Hyunji Lee
Seunghyun Yoon
Yunjae Won
Hanseok Oh
Geewook Kim
Trung H. Bui
Franck Dernoncourt
Elias Stengel-Eskin
Mohit Bansal
Minjoon Seo
LRM
46
0
0
18 Jun 2025
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
René Peinl
Vincent Tischler
CoGeVLM
45
0
0
13 Jun 2025
Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion
Jacob A. Hansen
Wei Lin
Junmo Kang
M. Jehanzeb Mirza
Hongyin Luo
Rogerio Feris
Alan Ritter
James R. Glass
Leonid Karlinsky
VLM
264
0
0
23 May 2025
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing
Guiming Hardy Chen
Ehsan Aghazadeh
Xin Eric Wang
Xinya Du
139
0
0
04 May 2025
MULTI: Multimodal Understanding Leaderboard with Text and Images
MULTI: Multimodal Understanding Leaderboard with Text and Images
Zichen Zhu
Yang Xu
Lu Chen
Jingkai Yang
Yichuan Ma
...
Yingzi Ma
Situo Zhang
Zihan Zhao
Liangtai Sun
Kai Yu
VLM
118
5
0
08 Jan 2025
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
S. Joshi
Besmira Nushi
Vidhisha Balachandran
Varun Chandrasekaran
Vibhav Vineet
Neel Joshi
Baharan Mirzasoleiman
MLLMVLM
172
0
0
07 Jan 2025
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li
Y. X. Wei
Zhihui Xie
Xuqing Yang
Yifan Song
...
Tianyu Liu
Sujian Li
Bill Yuchen Lin
Dianbo Sui
Qiang Liu
VLMCoGe
200
32
0
26 Nov 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
221
37
0
04 Oct 2024
LLaVA-Critic: Learning to Evaluate Multimodal Models
LLaVA-Critic: Learning to Evaluate Multimodal Models
Tianyi Xiong
Xinze Wang
Dong Guo
Qinghao Ye
Haoqi Fan
Quanquan Gu
Heng Huang
Chunyuan Li
MLLMVLMLRM
146
53
0
03 Oct 2024
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Shengyu Feng
Xiang Kong
Shuang Ma
Aonan Zhang
Dong Yin
Chong-Jun Wang
Ruoming Pang
Yiming Yang
LRM
120
2
0
02 Oct 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
112
96
0
16 Aug 2024
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Jinsheng Huang
Liang Chen
Taian Guo
Fu Zeng
Yusheng Zhao
...
Wei Ju
Luchen Liu
Tianyu Liu
Baobao Chang
Ming Zhang
184
7
0
29 Jun 2024
Curriculum Learning with Quality-Driven Data Selection
Curriculum Learning with Quality-Driven Data Selection
Biao Wu
Fang Meng
126
2
0
27 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Yangfu Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
163
12
0
04 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
VLMMLLM
187
421
0
31 May 2024
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
Hongyu Wang
Jiayu Xu
Senwei Xie
Ruiping Wang
Jialin Li
Zhaojie Xie
Bin Zhang
Chuyan Xiong
Xilin Chen
ELMVLMLRM
165
6
0
24 May 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
191
18
0
07 Apr 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
112
3
0
07 Mar 2024
Aligning Modalities in Vision Large Language Models via Preference
  Fine-tuning
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Yiyang Zhou
Chenhang Cui
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
VLMMLLM
134
121
0
18 Feb 2024
Cross-modal Retrieval for Knowledge-based Visual Question Answering
Cross-modal Retrieval for Knowledge-based Visual Question Answering
Paul Lerner
Olivier Ferret
C. Guinaudeau
89
9
0
11 Jan 2024
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLMMLLMLRM
187
23
0
02 Nov 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified
  Re-Formulation of Task-Oriented Benchmarks
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
122
15
0
04 Oct 2023
Towards End-to-End Embodied Decision Making via Multi-modal Large
  Language Model: Explorations with GPT4-Vision and Beyond
Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Liang Chen
Yichi Zhang
Shuhuai Ren
Haozhe Zhao
Zefan Cai
Yuchi Wang
Peiyi Wang
Tianyu Liu
Baobao Chang
LM&RoLLMAG
182
44
0
03 Oct 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the
  Wild
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
145
18
0
14 Sep 2023
1