512

Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis

Main:8 Pages
14 Figures
Bibliography:2 Pages
10 Tables
Appendix:7 Pages
Abstract

The goal of Novel View Synthesis (NVS) is to generate realistic images of a given content from unseen viewpoints. But how can we trust that a generated image truly reflects the intended transformation? Evaluating its reliability remains a major challenge. While recent generative models, particularly diffusion-based approaches, have significantly improved NVS quality, existing evaluation metrics struggle to assess whether a generated image is both realistic and faithful to the source view and intended viewpoint transformation. Standard metrics, such as pixel-wise similarity and distribution-based measures, often mis-rank incorrect results as they fail to capture the nuanced relationship between the source image, viewpoint change, and generated output. We propose a task-aware evaluation framework that leverages features from a strong NVS foundation model, Zero123, combined with a lightweight tuning step to enhance discrimination. Using these features, we introduce two complementary evaluation metrics: a reference-based score, DPRISMD_{\text{PRISM}}, and a reference-free score, MMDPRISM\text{MMD}_{\text{PRISM}}. Both reliably identify incorrect generations and rank models in agreement with human preference studies, addressing a fundamental gap in NVS evaluation. Our framework provides a principled and practical approach to assessing synthesis quality, paving the way for more reliable progress in novel view synthesis. To further support this goal, we apply our reference-free metric to six NVS methods across three benchmarks: Toys4K, Google Scanned Objects (GSO), and OmniObject3D, where MMDPRISM\text{MMD}_{\text{PRISM}} produces a clear and stable ranking, with lower scores consistently indicating stronger models.

View on arXiv
Comments on this paper