9

SAP: Segment Any 4K Panorama

Lutao Jiang
Zidong Cao
Weikai Chen
Xu Zheng
Yuanhuiyi Lyu
Zhenyang Li
Zeyu HU
Yingda Yin
Keyang Luo
Runze Zhang
Kai Yan
Shengju Qian
Haidi Fan
Yifan Peng
Xin Wang
Hui Xiong
Ying-Cong Chen
Main:14 Pages
7 Figures
Bibliography:3 Pages
8 Tables
Appendix:4 Pages
Abstract

Promptable instance segmentation is widely adopted in embodied and AR systems, yet the performance of foundation models trained on perspective imagery often degrades on 360° panoramas. In this paper, we introduce Segment Any 4K Panorama (SAP), a foundation model for 4K high-resolution panoramic instance-level segmentation. We reformulate panoramic segmentation as fixed-trajectory perspective video segmentation, decomposing a panorama into overlapping perspective patches sampled along a continuous spherical traversal. This memory-aligned reformulation preserves native 4K resolution while restoring the smooth viewpoint transitions required for stable cross-view propagation. To enable large-scale supervision, we synthesize 183,440 4K-resolution panoramic images with instance segmentation labels using the InfiniGen engine. Trained under this trajectory-aligned paradigm, SAP generalizes effectively to real-world 360° images, achieving +17.2 zero-shot mIoU gain over vanilla SAM2 of different sizes on real-world 4K panorama benchmark.

View on arXiv
Comments on this paper