MiMo-VL Technical Report

Main:18 Pages

14 Figures

Bibliography:6 Pages

6 Tables

Appendix:8 Pages

Abstract

We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with 56.1 on OSWorld-G, even outperforming specialized models such as UI-TARS. Our training combines four-stage pre-training (2.4 trillion tokens) with Mixed On-policy Reinforcement Learning (MORL) integrating diverse reward signals. We identify the importance of incorporating high-quality reasoning data with long Chain-of-Thought into pre-training stages, and the benefits of mixed RL despite challenges in simultaneous multi-domain optimization. We also contribute a comprehensive evaluation suite covering 50+ tasks to promote reproducibility and advance the field. The model checkpoints and full evaluation suite are available atthis https URL.

View on arXiv

@article{team2025_2506.03569,
  title={ MiMo-VL Technical Report },
  author={ Xiaomi LLM-Core Team and Zihao Yue and Zhenru Lin and Yifan Song and Weikun Wang and Shuhuai Ren and Shuhao Gu and Shicheng Li and Peidian Li and Liang Zhao and Lei Li and Kainan Bao and Hao Tian and Hailin Zhang and Gang Wang and Dawei Zhu and Cici and Chenhong He and Bowen Ye and Bowen Shen and Zihan Zhang and Zihan Jiang and Zhixian Zheng and Zhichao Song and Zhenbo Luo and Yue Yu and Yudong Wang and Yuanyuan Tian and Yu Tu and Yihan Yan and Yi Huang and Xu Wang and Xinzhe Xu and Xingchen Song and Xing Zhang and Xing Yong and Xin Zhang and Xiangwei Deng and Wenyu Yang and Wenhan Ma and Weiwei Lv and Weiji Zhuang and Wei Liu and Sirui Deng and Shuo Liu and Shimao Chen and Shihua Yu and Shaohui Liu and Shande Wang and Rui Ma and Qiantong Wang and Peng Wang and Nuo Chen and Menghang Zhu and Kangyang Zhou and Kang Zhou and Kai Fang and Jun Shi and Jinhao Dong and Jiebao Xiao and Jiaming Xu and Huaqiu Liu and Hongshen Xu and Heng Qu and Haochen Zhao and Hanglong Lv and Guoan Wang and Duo Zhang and Dong Zhang and Di Zhang and Chong Ma and Chang Liu and Can Cai and Bingquan Xia },
  journal={arXiv preprint arXiv:2506.03569},
  year={ 2025 }
}

Comments on this paper