34
8

Seed1.5-VL Technical Report

Abstract

We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across diverse tasks. In this report, we mainly provide a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages, hoping that this report can inspire further research. Seed1.5-VL is now accessible atthis https URL(Volcano Engine Model ID: doubao-1-5-thinking-vision-pro-250428)

View on arXiv
@article{guo2025_2505.07062,
  title={ Seed1.5-VL Technical Report },
  author={ Dong Guo and Faming Wu and Feida Zhu and Fuxing Leng and Guang Shi and Haobin Chen and Haoqi Fan and Jian Wang and Jianyu Jiang and Jiawei Wang and Jingji Chen and Jingjia Huang and Kang Lei and Liping Yuan and Lishu Luo and Pengfei Liu and Qinghao Ye and Rui Qian and Shen Yan and Shixiong Zhao and Shuai Peng and Shuangye Li and Sihang Yuan and Sijin Wu and Tianheng Cheng and Weiwei Liu and Wenqian Wang and Xianhan Zeng and Xiao Liu and Xiaobo Qin and Xiaohan Ding and Xiaojun Xiao and Xiaoying Zhang and Xuanwei Zhang and Xuehan Xiong and Yanghua Peng and Yangrui Chen and Yanwei Li and Yanxu Hu and Yi Lin and Yiyuan Hu and Yiyuan Zhang and Youbin Wu and Yu Li and Yudong Liu and Yue Ling and Yujia Qin and Zanbo Wang and Zhiwu He and Aoxue Zhang and Bairen Yi and Bencheng Liao and Can Huang and Can Zhang and Chaorui Deng and Chaoyi Deng and Cheng Lin and Cheng Yuan and Chenggang Li and Chenhui Gou and Chenwei Lou and Chengzhi Wei and Chundian Liu and Chunyuan Li and Deyao Zhu and Donghong Zhong and Feng Li and Feng Zhang and Gang Wu and Guodong Li and Guohong Xiao and Haibin Lin and Haihua Yang and Haoming Wang and Heng Ji and Hongxiang Hao and Hui Shen and Huixia Li and Jiahao Li and Jialong Wu and Jianhua Zhu and Jianpeng Jiao and Jiashi Feng and Jiaze Chen and Jianhui Duan and Jihao Liu and Jin Zeng and Jingqun Tang and Jingyu Sun and Joya Chen and Jun Long and Junda Feng and Junfeng Zhan and Junjie Fang and Junting Lu and Kai Hua and Kai Liu and Kai Shen and Kaiyuan Zhang and Ke Shen },
  journal={arXiv preprint arXiv:2505.07062},
  year={ 2025 }
}
Comments on this paper