MiniCPM4: Ultra-Efficient LLMs on End Devices

Main:37 Pages

8 Figures

Bibliography:5 Pages

10 Tables

Appendix:1 Pages

Abstract

This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we proposethis http URLthat integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Sufficient evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences. Through further adaptation, MiniCPM4 successfully powers diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability.

View on arXiv

@article{team2025_2506.07900,
  title={ MiniCPM4: Ultra-Efficient LLMs on End Devices },
  author={ MiniCPM Team and Chaojun Xiao and Yuxuan Li and Xu Han and Yuzhuo Bai and Jie Cai and Haotian Chen and Wentong Chen and Xin Cong and Ganqu Cui and Ning Ding and Shengdan Fan and Yewei Fang and Zixuan Fu and Wenyu Guan and Yitong Guan and Junshao Guo and Yufeng Han and Bingxiang He and Yuxiang Huang and Cunliang Kong and Qiuzuo Li and Siyuan Li and Wenhao Li and Yanghao Li and Yishan Li and Zhen Li and Dan Liu and Biyuan Lin and Yankai Lin and Xiang Long and Quanyu Lu and Yaxi Lu and Peiyan Luo and Hongya Lyu and Litu Ou and Yinxu Pan and Zekai Qu and Qundong Shi and Zijun Song and Jiayuan Su and Zhou Su and Ao Sun and Xianghui Sun and Peijun Tang and Fangzheng Wang and Feng Wang and Shuo Wang and Yudong Wang and Yesai Wu and Zhenyu Xiao and Jie Xie and Zihao Xie and Yukun Yan and Jiarui Yuan and Kaihuo Zhang and Lei Zhang and Linyue Zhang and Xueren Zhang and Yudi Zhang and Hengyu Zhao and Weilin Zhao and Weilun Zhao and Yuanqian Zhao and Zhi Zheng and Ge Zhou and Jie Zhou and Wei Zhou and Zihan Zhou and Zixuan Zhou and Zhiyuan Liu and Guoyang Zeng and Chao Jia and Dahai Li and Maosong Sun },
  journal={arXiv preprint arXiv:2506.07900},
  year={ 2025 }
}

Comments on this paper