THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation

Main:2 Pages
1 Figures
Bibliography:2 Pages
2 Tables
Abstract
In this report, we describe our approach to egocentric video object segmentation. Our method combines large-scale visual pretraining from SAM2 with depth-based geometric cues to handle complex scenes and long-term tracking. By integrating these signals in a unified framework, we achieve strong segmentation performance. On the VISOR test set, our method reaches a J&F score of 90.1%.
View on arXiv@article{gao2025_2506.06748, title={ THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation }, author={ Mingqi Gao and Haoran Duan and Tianlu Zhang and Jungong Han }, journal={arXiv preprint arXiv:2506.06748}, year={ 2025 } }
Comments on this paper