47

AnyView: Synthesizing Any Novel View in Dynamic Scenes

Basile Van Hoorick
Dian Chen
Shun Iwase
Pavel Tokmakov
Muhammad Zubair Irshad
Igor Vasiljevic
Swati Gupta
Fangzhou Cheng
Sergey Zakharov
Vitor Campagnolo Guizilini
Main:8 Pages
12 Figures
Bibliography:4 Pages
5 Tables
Appendix:6 Pages
Abstract

Modern generative video models excel at producing convincing, high-quality outputs, but struggle to maintain multi-view and spatiotemporal consistency in highly dynamic real-world environments. In this work, we introduce \textbf{AnyView}, a diffusion-based video generation framework for \emph{dynamic view synthesis} with minimal inductive biases or geometric assumptions. We leverage multiple data sources with various levels of supervision, including monocular (2D), multi-view static (3D) and multi-view dynamic (4D) datasets, to train a generalist spatiotemporal implicit representation capable of producing zero-shot novel videos from arbitrary camera locations and trajectories. We evaluate AnyView on standard benchmarks, showing competitive results with the current state of the art, and propose \textbf{AnyViewBench}, a challenging new benchmark tailored towards \emph{extreme} dynamic view synthesis in diverse real-world scenarios. In this more dramatic setting, we find that most baselines drastically degrade in performance, as they require significant overlap between viewpoints, while AnyView maintains the ability to produce realistic, plausible, and spatiotemporally consistent videos when prompted from \emph{any} viewpoint. Results, data, code, and models can be viewed at:this https URL

View on arXiv
Comments on this paper