Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking
Cheng-Yen Yang
Hsiang-Wei Huang
Pyong-Kun Kim
Chien-Kai Kuo
Jui-Wei Chang
Kwang-Ju Kim
Chung-I Huang
Jenq-Neng Hwang

Abstract
We present an effective approach for adapting the Segment Anything Model 2 (SAM2) to the Visual Object Tracking (VOT) task. Our method leverages the powerful pre-trained capabilities of SAM2 and incorporates several key techniques to enhance its performance in VOT applications. By combining SAM2 with our proposed optimizations, we achieved a first place AUC score of 89.4 on the 2024 ICPR Multi-modal Object Tracking challenge, demonstrating the effectiveness of our approach. This paper details our methodology, the specific enhancements made to SAM2, and a comprehensive analysis of our results in the context of VOT solutions along with the multi-modality aspect of the dataset.
View on arXiv@article{yang2025_2505.18111, title={ Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking }, author={ Cheng-Yen Yang and Hsiang-Wei Huang and Pyong-Kun Kim and Chien-Kai Kuo and Jui-Wei Chang and Kwang-Ju Kim and Chung-I Huang and Jenq-Neng Hwang }, journal={arXiv preprint arXiv:2505.18111}, year={ 2025 } }
Comments on this paper