Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video

28 April 2025

Abstract

Neural Radiance Fields (NeRF) has demonstrated its superior capability to represent 3D geometry but require accurately precomputed camera poses during training. To mitigate this requirement, existing methods jointly optimize camera poses and NeRF often relying on good pose initialisation or depth priors. However, these approaches struggle in challenging scenarios, such as large rotations, as they map each camera to a world coordinate system. We propose a novel method that eliminates prior dependencies by modeling continuous camera motions as time-dependent angular velocity and velocity. Relative motions between cameras are learned first via velocity integration, while camera poses can be obtained by aggregating such relative motions up to a world coordinate system defined at a single time step within the video. Specifically, accurate continuous camera movements are learned through a time-dependent NeRF, which captures local scene geometry and motion by training from neighboring frames for each time step. The learned motions enable fine-tuning the NeRF to represent the full scene geometry. Experiments on Co3D and Scannet show our approach achieves superior camera pose and depth estimation and comparable novel-view synthesis performance compared to state-of-the-art methods. Our code is available atthis https URL.

View on arXiv

@article{nguyen2025_2504.19819,
  title={ Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video },
  author={ Hoang Chuong Nguyen and Wei Mao and Jose M. Alvarez and Miaomiao Liu },
  journal={arXiv preprint arXiv:2504.19819},
  year={ 2025 }
}

Comments on this paper