39

Fast-SAM3D: 3Dfy Anything in Images but Faster

Weilun Feng
Mingqiang Wu
Zhiliang Chen
Chuanguang Yang
Haotong Qin
Yuqi Li
Xiaokun Liu
Guoxin Fan
Zhulin An
Libo Huang
Yulun Zhang
Michele Magno
Yongjun Xu
Main:8 Pages
12 Figures
Bibliography:3 Pages
10 Tables
Appendix:7 Pages
Abstract

SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency. In this work, we conduct the \textbf{first systematic investigation} into its inference dynamics, revealing that generic acceleration strategies are brittle in this context. We demonstrate that these failures stem from neglecting the pipeline's inherent multi-level \textbf{heterogeneity}: the kinematic distinctiveness between shape and layout, the intrinsic sparsity of texture refinement, and the spectral variance across geometries. To address this, we present \textbf{Fast-SAM3D}, a training-free framework that dynamically aligns computation with instantaneous generation complexity. Our approach integrates three heterogeneity-aware mechanisms: (1) \textit{Modality-Aware Step Caching} to decouple structural evolution from sensitive layout updates; (2) \textit{Joint Spatiotemporal Token Carving} to concentrate refinement on high-entropy regions; and (3) \textit{Spectral-Aware Token Aggregation} to adapt decoding resolution. Extensive experiments demonstrate that Fast-SAM3D delivers up to \textbf{2.67×\times} end-to-end speedup with negligible fidelity loss, establishing a new Pareto frontier for efficient single-view 3D generation. Our code is released inthis https URL.

View on arXiv
Comments on this paper