122
0

Temporal Triplane Transformers as Occupancy World Models

Abstract

World models aim to learn or construct representations of the environment that enable the prediction of future scenes, thereby supporting intelligent motion planning. However, existing models often struggle to produce fine-grained predictions and to operate in real time. In this work, we propose T3^3Former, a novel 4D occupancy world model for autonomous driving. T3^3Former begins by pre-training a compact {\em triplane} representation that efficiently encodes 3D occupancy. It then extracts multi-scale temporal motion features from historical triplanes and employs an autoregressive approach to iteratively predict future triplane changes. Finally, these triplane changes are combined with previous states to decode future occupancy and ego-motion trajectories. Experimental results show that T3^3Former achieves 1.44×\times speedup (26 FPS), improves mean IoU to 36.09, and reduces mean absolute planning error to 1.0 meters. Demos are available in the supplementary material.

View on arXiv
@article{xu2025_2503.07338,
  title={ Temporal Triplane Transformers as Occupancy World Models },
  author={ Haoran Xu and Peixi Peng and Guang Tan and Yiqian Chang and Yisen Zhao and Yonghong Tian },
  journal={arXiv preprint arXiv:2503.07338},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.