147
2

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Main:8 Pages
13 Figures
Bibliography:4 Pages
7 Tables
Appendix:6 Pages
Abstract

Despite remarkable advancements in video depth estimation, existing methods exhibit inherent limitations in achieving geometric fidelity through the affine-invariant predictions, limiting their applicability in reconstruction and other metrically grounded downstream tasks. We propose GeometryCrafter, a novel framework that recovers high-fidelity point map sequences with temporal coherence from open-world videos, enabling accurate 3D/4D reconstruction, camera parameter estimation, and other depth-based applications. At the core of our approach lies a point map Variational Autoencoder (VAE) that learns a latent space agnostic to video latent distributions for effective point map encoding and decoding. Leveraging the VAE, we train a video diffusion model to model the distribution of point map sequences conditioned on the input videos. Extensive evaluations on diverse datasets demonstrate that GeometryCrafter achieves state-of-the-art 3D accuracy, temporal consistency, and generalization capability.

View on arXiv
@article{xu2025_2504.01016,
  title={ GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors },
  author={ Tian-Xing Xu and Xiangjun Gao and Wenbo Hu and Xiaoyu Li and Song-Hai Zhang and Ying Shan },
  journal={arXiv preprint arXiv:2504.01016},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.