Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting

16 April 2026

Inseok Jeon

Minhyeok Lee

Seunghoon Lee

Minseok Kang

Suhwan Cho

Sangyoun Lee

DiffM

VGen

ArXiv (abs)PDF HTML Github

Main:17 Pages

19 Figures

Bibliography:2 Pages

6 Tables

Abstract

Video outpainting aims to expand the visible content of a video beyond the original frame boundaries while preserving spatial fidelity and temporal coherence across frames. Existing methods primarily rely on large-scale generative models, such as diffusion models. However, generationbased approaches suffer from implicit temporal modeling and limited spatial context. These limitations lead to intraframe and inter-frame inconsistencies, which become particularly pronounced in dynamic scenes and large outpainting scenarios. To overcome these challenges, we propose Seen-to-Scene, a novel framework that unifies propagationbased and generation-based paradigms for video outpainting. Specifically, Seen-to-Scene leverages flow-based propagation with a flow completion network pre-trained for video inpainting, which is fine-tuned in an end-to-end manner to bridge the domain gap and reconstruct coherent motion fields. To further improve the efficiency and reliability of propagation, we introduce a reference-guided latent propagation that effectively propagates source content across frames. Extensive experiments demonstrate that our method achieves superior temporal coherence and visual realism with efficient inference, surpassing even prior state-of-the-art methods that require input-specific adaptation.

View on arXiv

Comments on this paper