ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.10479
  4. Cited By
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
v1v2v3 (latest)

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

14 April 2025
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
Lixin Gu
Yuchen Duan
H. Tian
Weijie Su
Jie Shao
Zhangwei Gao
Erfei Cui
Yue Cao
Yangzhou Liu
Xingguang Wei
Hongjie Zhang
Haomin Wang
Wenyuan Xu
Hao Li
Jiahao Wang
Dengnian Chen
Songze Li
Yinan He
Tan Jiang
Jiapeng Luo
Yi Wang
Conghui He
Botian Shi
Xinsong Zhang
Wenqi Shao
Junjun He
Yingtong Xiong
Wenwen Qu
Peng Sun
Penglong Jiao
Han Lv
Lijun Wu
Kai Zhang
Huipeng Deng
Jiaye Ge
Kai Chen
Limin Wang
Min Dou
Lewei Lu
X. Zhu
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
    MLLMVLM
ArXiv (abs)PDFHTML

Papers citing "InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models"

50 / 161 papers shown
Title
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
127
91
1
15 Nov 2024
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Zhiyong Wu
Zhenyu Wu
Fangzhi Xu
Yian Wang
Qiushi Sun
...
Kanzhi Cheng
Zichen Ding
Lixing Chen
Paul Pu Liang
Yu Qiao
81
73
0
30 Oct 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
204
1,019
0
25 Oct 2024
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Shuhao Gu
Jialing Zhang
Siyuan Zhou
Kevin Yu
Zhaohu Xing
...
Yufeng Cui
Xinlong Wang
Yaoqi Liu
Fangxiang Feng
Guang Liu
SyDaVLMMLLM
88
29
0
24 Oct 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5%
  Parameters and 90% Performance
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
135
39
0
21 Oct 2024
R-Bench: Are your Large Multimodal Model Robust to Real-world
  Corruptions?
R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?
Chunyi Li
Junxuan Zhang
Zicheng Zhang
H. Wu
Yuan Tian
...
Guo Lu
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
AAML
69
4
0
07 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLMMLLM
105
40
1
30 Sep 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
  Multimodal Models
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Matt Deitke
Christopher Clark
Sangho Lee
Rohun Tripathi
Yue Yang
...
Noah A. Smith
Hannaneh Hajishirzi
Ross Girshick
Ali Farhadi
Aniruddha Kembhavi
OSLMVLM
80
14
0
25 Sep 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
153
72
0
19 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLMVLMLRM
80
73
0
17 Sep 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
120
68
0
28 Aug 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Yi-Fan Zhang
Huanyu Zhang
Haochen Tian
Chaoyou Fu
Shuangqing Zhang
...
Qingsong Wen
Zhang Zhang
Liwen Wang
Rong Jin
Tieniu Tan
OffRL
115
52
0
23 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLMSyDaVLM
117
860
0
06 Aug 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than
  Scaling Model Parameters
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
192
692
0
06 Aug 2024
MMIU: Multimodal Multi-image Understanding for Evaluating Large
  Vision-Language Models
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng
Jun Wang
Chuanhao Li
Quanfeng Lu
Hao Tian
...
Jifeng Dai
Ping Luo
Ping Luo
Kaipeng Zhang
Wenqi Shao
VLM
84
26
0
05 Aug 2024
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Yuan Yao
Tianyu Yu
Ao Zhang
Chongyi Wang
Junbo Cui
...
Xu Han
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
VLMMLLM
116
475
0
03 Aug 2024
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models
  for Integrated Capabilities
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linfeng Ren
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
Xinchao Wang
VLMMLLM
101
25
0
01 Aug 2024
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language
  Understanding
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Haoning Wu
Dongxu Li
Bei Chen
Junnan Li
96
163
0
22 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MAVLM
148
177
0
16 Jul 2024
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual
  Contexts
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
Yijia Xiao
Edward Sun
Tianyu Liu
Wei Wang
LRM
58
42
0
06 Jul 2024
We-Math: Does Your Large Multimodal Model Achieve Human-like
  Mathematical Reasoning?
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao
Qiuna Tan
Guanting Dong
Minhui Wu
Chong Sun
...
Yida Xu
Muxi Diao
Zhimin Bao
Chen Li
Honggang Zhang
VLMLRM
90
55
0
01 Jul 2024
LLM Critics Help Catch LLM Bugs
LLM Critics Help Catch LLM Bugs
Nat McAleese
Rai Michael Pokorny
Juan Felipe Cerón Uribe
Evgenia Nitishinskaya
Maja Trebacz
Jan Leike
ALMLRM
62
82
0
28 Jun 2024
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal
  LLMs
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Zirui Wang
Mengzhou Xia
Luxi He
Howard Chen
Yitao Liu
...
Haotian Liu
Sadhika Malladi
Alexis Chevalier
Sanjeev Arora
Danqi Chen
61
64
0
26 Jun 2024
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large
  Language Models
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
Wenhao Shi
Zhiqiang Hu
Yi Bin
Junhua Liu
Yang Yang
See-Kiong Ng
Lidong Bing
Roy Ka-Wei Lee
SyDaMLLMLRM
84
62
0
25 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DVMLLM
118
377
0
24 Jun 2024
Long Context Transfer from Language to Vision
Long Context Transfer from Language to Vision
Peiyuan Zhang
Kaichen Zhang
Bo Li
Guangtao Zeng
Jingkang Yang
Yuanhan Zhang
Ziyue Wang
Haoran Tan
Chunyuan Li
Ziwei Liu
VLM
121
186
0
24 Jun 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video
  Understanding
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Xinyu Fang
Kangrui Mao
Haodong Duan
Xiangyu Zhao
Yining Li
Dahua Lin
Kai Chen
VLM
95
82
0
20 Jun 2024
Benchmarking Multi-Image Understanding in Vision and Language Models:
  Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Bingchen Zhao
Yongshuo Zong
Letian Zhang
Timothy Hospedales
VLM
86
19
0
18 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human
  Preferences
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
98
33
0
16 Jun 2024
MuirBench: A Comprehensive Benchmark for Robust Multi-image
  Understanding
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang
Xingyu Fu
James Y. Huang
Zekun Li
Qin Liu
...
Kai-Wei Chang
Dan Roth
Sheng Zhang
Hoifung Poon
Muhao Chen
VLM
103
59
0
13 Jun 2024
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
Bang Liu
Yoshua Bengio
53
5
0
10 Jun 2024
Improve Mathematical Reasoning in Language Models by Automated Process
  Supervision
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Liangchen Luo
Yinxiao Liu
Rosanne Liu
Samrat Phatale
Harsh Lara
...
Lei Shu
Yun Zhu
Lei Meng
Jiao Sun
Abhinav Rastogi
LRM
93
189
0
05 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Yangfu Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
106
12
0
04 Jun 2024
Ovis: Structural Embedding Alignment for Multimodal Large Language Model
Ovis: Structural Embedding Alignment for Multimodal Large Language Model
Shiyin Lu
Yang Li
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Han-Jia Ye
VLMMLLM
112
55
0
31 May 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
VLMMLLM
154
418
0
31 May 2024
M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal
  Chain-of-Thought
M3^33CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought
Qiguang Chen
Libo Qin
Jin Zhang
Zhi Chen
Xiao Xu
Wanxiang Che
LRM
101
61
0
26 May 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi-dong Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Hao Liu
Xiang Bai
Can Huang
Xiang Bai
Can Huang
124
28
0
20 May 2024
MANTIS: Interleaved Multi-Image Instruction Tuning
MANTIS: Interleaved Multi-Image Instruction Tuning
Dongfu Jiang
Xuan He
Huaye Zeng
Cong Wei
Max Ku
Qian Liu
Wenhu Chen
VLMMLLM
87
125
0
02 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLMVLM
113
637
0
25 Apr 2024
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with
  Text-Rich Visual Comprehension
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Bohao Li
Yuying Ge
Yi Chen
Yixiao Ge
Ruimao Zhang
Ying Shan
VLM
74
60
0
25 Apr 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large
  Vision-Language Models Towards Multitask AGI
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying
Fanqing Meng
Jin Wang
Zhiqiang Li
Han Lin
...
Yali Wang
Yuning Qiao
Ping Luo
Kaipeng Zhang
Wenqi Shao
73
97
0
24 Apr 2024
BLINK: Multimodal Large Language Models Can See but Not Perceive
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu
Yushi Hu
Bangzheng Li
Yu Feng
Haoyu Wang
Xudong Lin
Dan Roth
Noah A. Smith
Wei-Chiu Ma
Ranjay Krishna
VLMLRMMLLM
94
149
0
18 Apr 2024
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
  Language Models
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Haotian Zhang
Haoxuan You
Philipp Dufter
Bowen Zhang
Chen Chen
...
Tsu-Jui Fu
William Y. Wang
Shih-Fu Chang
Zhe Gan
Yinfei Yang
ObjDMLLM
130
51
0
11 Apr 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model
  Handling Resolutions from 336 Pixels to 4K HD
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Xingcheng Zhang
Jifeng Dai
Yuxin Qiao
Dahua Lin
Jiaqi Wang
VLMMLLM
93
127
0
09 Apr 2024
Binary Classifier Optimization for Large Language Model Alignment
Binary Classifier Optimization for Large Language Model Alignment
Seungjae Jung
Gunsoo Han
D. W. Nam
Kyoung-Woon On
72
25
0
06 Apr 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
...
Haodong Duan
Jiaqi Wang
Yu Qiao
Dahua Lin
Feng Zhao
VLM
118
302
0
29 Mar 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual
  Math Problems?
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang
Dongzhi Jiang
Yichi Zhang
Haokun Lin
Ziyu Guo
...
Aojun Zhou
Pan Lu
Kai-Wei Chang
Peng Gao
Hongsheng Li
71
252
0
21 Mar 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the
  Open World
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang
Yiming Ren
Hao Luo
Tiantong Li
Chenxiang Yan
...
Qingyun Li
Lewei Lu
Xizhou Zhu
Yu Qiao
Jifeng Dai
MLLM
89
53
0
29 Feb 2024
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
Ke Wang
Junting Pan
Weikang Shi
Zimu Lu
Mingjie Zhan
Hongsheng Li
84
187
0
22 Feb 2024
InternEvo: Efficient Long-sequence Large Language Model Training via
  Hybrid Parallelism and Redundant Sharding
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
Qiaoling Chen
Diandian Gu
Guoteng Wang
Xun Chen
Yingtong Xiong
...
Qi Hu
Xin Jin
Yonggang Wen
Tianwei Zhang
Peng Sun
74
8
0
17 Jan 2024
Previous
1234
Next