Dual-system VLA (Vision-Language-Action) architectures have become a hot topic in embodied intelligence research, but there is a lack of sufficient open-source work for further performance analysis and optimization. To address this problem, this paper will summarize and compare the structural designs of existing dual-system architectures, and conduct systematic empirical evaluations on the core design elements of existing dual-system architectures. Ultimately, it will provide a low-cost open-source model for further exploration. Of course, this project will continue to update with more experimental conclusions and open-source models with improved performance for everyone to choose from. Project page:this https URL.
View on arXiv@article{cui2025_2505.03912, title={ OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation }, author={ Can Cui and Pengxiang Ding and Wenxuan Song and Shuanghao Bai and Xinyang Tong and Zirui Ge and Runze Suo and Wanqi Zhou and Yang Liu and Bofang Jia and Han Zhao and Siteng Huang and Donglin Wang }, journal={arXiv preprint arXiv:2505.03912}, year={ 2025 } }