HoloBrain-0 Technical Report

12 February 2026

Xuewu Lin

Tianwei Lin

Yun Du

Hongyu Xie

Yiwei Jin

Jiawei Li

Shijie Wu

Qingze Wang

Mengdi Li

Mengao Zhao

Ziang Li

Chaodong Huang

Hongzhe Bi

Lichao Huang

Zhizhong Su

LM&Ro

ArXiv (abs)PDF HTML Github (69★)

Main:17 Pages

13 Figures

Bibliography:5 Pages

14 Tables

Appendix:10 Pages

Abstract

In this work, we introduce HoloBrain-0, a comprehensive Vision-Language-Action (VLA) framework that bridges the gap between foundation model research and reliable real-world robot deployment. The core of our system is a novel VLA architecture that explicitly incorporates robot embodiment priors, including multi-view camera parameters and kinematic descriptions (URDF), to enhance 3D spatial reasoning and support diverse embodiments. We validate this design through a scalable ``pre-train then post-train" paradigm, achieving state-of-the-art results on simulation benchmarks such as RoboTwin 2.0, LIBERO, and GenieSim, as well as strong results on challenging long-horizon real-world manipulation tasks. Notably, our efficient 0.2B-parameter variant rivals significantly larger baselines, enabling low-latency on-device deployment. To further accelerate research and practical adoption, we fully open-source the entire HoloBrain ecosystem, which includes: (1) powerful pre-trained VLA foundations; (2) post-trained checkpoints for multiple simulation suites and real-world tasks; and (3) RoboOrchard, a full-stack VLA infrastructure for data curation, model training and deployment. Together with standardized data collection protocols, this release provides the community with a complete, reproducible path toward high-performance robotic manipulation.

View on arXiv

Comments on this paper