ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.07491
196
1

Kimi-VL Technical Report

10 April 2025
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
B. Wang
Cheng Chen
C. Zhang
C. Du
Chu Wei
C. Wang
Dehao Zhang
Dikang Du
Dongliang Wang
Enming Yuan
Enzhe Lu
Fang Li
Flood Sung
Guangda Wei
Guokun Lai
Han Zhu
Hao Ding
Hao Hu
Hao Yang
Hao Zhang
Haoning Wu
Haotian Yao
Haoyu Lu
H. Wang
Hongcheng Gao
Huabin Zheng
J. Li
Jianlin Su
J. Wang
Jiaqi Deng
J. Qiu
Jin Xie
J. Wang
J. H. Liu
Junjie Yan
Kun Ouyang
L. Chen
Lin Sui
L. Yu
Mengfan Dong
Mengnan Dong
Nuo Xu
Pengyu Cheng
Qizheng Gu
Runjie Zhou
S. Liu
Sihan Cao
Tao Yu
Tianhui Song
Tongtong Bai
Wei Song
Weiran He
W. R. Huang
Weixin Xu
Xiaokun Yuan
Xingcheng Yao
Xingzhe Wu
Xinxing Zu
Xinyu Zhou
Xinyuan Wang
Y. Charles
Y. Zhong
Yang Li
Y. Hu
Yanru Chen
Yejie Wang
Yibo Liu
Yibo Miao
Y. Qin
Y. Chen
Yiping Bao
Y. Wang
Yongsheng Kang
Yuanxin Liu
Yulun Du
Yuxin Wu
Yuzhi Wang
Yuzi Yan
Zaida Zhou
Z. Li
Z. L. Jiang
Zheng Zhang
Zhilin Yang
Zhiqi Huang
Zihao Huang
Zijia Zhao
Z. Chen
Zongyu Lin
    MLLM
    VLM
    MoE
ArXivPDFHTML
Abstract

We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B parameters in its language decoder (Kimi-VL-A3B). Kimi-VL demonstrates strong performance across challenging domains: as a general-purpose VLM, Kimi-VL excels in multi-turn agent tasks (e.g., OSWorld), matching flagship models. Furthermore, it exhibits remarkable capabilities across diverse challenging vision language tasks, including college-level image and video comprehension, OCR, mathematical reasoning, and multi-image understanding. In comparative evaluations, it effectively competes with cutting-edge efficient VLMs such as GPT-4o-mini, Qwen2.5-VL-7B, and Gemma-3-12B-IT, while surpassing GPT-4o in several key domains. Kimi-VL also advances in processing long contexts and perceiving clearly. With a 128K extended context window, Kimi-VL can process diverse long inputs, achieving impressive scores of 64.5 on LongVideoBench and 35.1 on MMLongBench-Doc. Its native-resolution vision encoder, MoonViT, further allows it to see and understand ultra-high-resolution visual inputs, achieving 83.2 on InfoVQA and 34.5 on ScreenSpot-Pro, while maintaining lower computational cost for common tasks. Building upon Kimi-VL, we introduce an advanced long-thinking variant: Kimi-VL-Thinking. Developed through long chain-of-thought (CoT) supervised fine-tuning (SFT) and reinforcement learning (RL), this model exhibits strong long-horizon reasoning capabilities. It achieves scores of 61.7 on MMMU, 36.8 on MathVision, and 71.3 on MathVista while maintaining the compact 2.8B activated LLM parameters, setting a new standard for efficient multimodal thinking models. Code and models are publicly accessible atthis https URL.

View on arXiv
@article{team2025_2504.07491,
  title={ Kimi-VL Technical Report },
  author={ Kimi Team and Angang Du and Bohong Yin and Bowei Xing and Bowen Qu and Bowen Wang and Cheng Chen and Chenlin Zhang and Chenzhuang Du and Chu Wei and Congcong Wang and Dehao Zhang and Dikang Du and Dongliang Wang and Enming Yuan and Enzhe Lu and Fang Li and Flood Sung and Guangda Wei and Guokun Lai and Han Zhu and Hao Ding and Hao Hu and Hao Yang and Hao Zhang and Haoning Wu and Haotian Yao and Haoyu Lu and Heng Wang and Hongcheng Gao and Huabin Zheng and Jiaming Li and Jianlin Su and Jianzhou Wang and Jiaqi Deng and Jiezhong Qiu and Jin Xie and Jinhong Wang and Jingyuan Liu and Junjie Yan and Kun Ouyang and Liang Chen and Lin Sui and Longhui Yu and Mengfan Dong and Mengnan Dong and Nuo Xu and Pengyu Cheng and Qizheng Gu and Runjie Zhou and Shaowei Liu and Sihan Cao and Tao Yu and Tianhui Song and Tongtong Bai and Wei Song and Weiran He and Weixiao Huang and Weixin Xu and Xiaokun Yuan and Xingcheng Yao and Xingzhe Wu and Xinxing Zu and Xinyu Zhou and Xinyuan Wang and Y. Charles and Yan Zhong and Yang Li and Yangyang Hu and Yanru Chen and Yejie Wang and Yibo Liu and Yibo Miao and Yidao Qin and Yimin Chen and Yiping Bao and Yiqin Wang and Yongsheng Kang and Yuanxin Liu and Yulun Du and Yuxin Wu and Yuzhi Wang and Yuzi Yan and Zaida Zhou and Zhaowei Li and Zhejun Jiang and Zheng Zhang and Zhilin Yang and Zhiqi Huang and Zihao Huang and Zijia Zhao and Ziwei Chen and Zongyu Lin },
  journal={arXiv preprint arXiv:2504.07491},
  year={ 2025 }
}
Comments on this paper