ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.08561
19
0

Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection

13 May 2025
Ayush Rai
Kyle Min
Tarun Krishna
Feiyan Hu
A. Smeaton
Noel E. O'Connor
    VGen
ArXivPDFHTML
Abstract

Masked video modeling~(MVM) has emerged as a highly effective pre-training strategy for visual foundation models, whereby the model reconstructs masked spatiotemporal tokens using information from visible tokens. However, a key challenge in such approaches lies in selecting an appropriate masking strategy. Previous studies have explored predefined masking techniques, including random and tube-based masking, as well as approaches that leverage key motion priors, optical flow and semantic cues from externally pre-trained models. In this work, we introduce a novel and generalizable Trajectory-Aware Adaptive Token Sampler (TATS), which models the motion dynamics of tokens and can be seamlessly integrated into the masked autoencoder (MAE) framework to select motion-centric tokens in videos. Additionally, we propose a unified training strategy that enables joint optimization of both MAE and TATS from scratch using Proximal Policy Optimization (PPO). We show that our model allows for aggressive masking without compromising performance on the downstream task of action recognition while also ensuring that the pre-training remains memory efficient. Extensive experiments of the proposed approach across four benchmarks, including Something-Something v2, Kinetics-400, UCF101, and HMDB51, demonstrate the effectiveness, transferability, generalization, and efficiency of our work compared to other state-of-the-art methods.

View on arXiv
@article{rai2025_2505.08561,
  title={ Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection },
  author={ Ayush K. Rai and Kyle Min and Tarun Krishna and Feiyan Hu and Alan F. Smeaton and Noel E. O'Connor },
  journal={arXiv preprint arXiv:2505.08561},
  year={ 2025 }
}
Comments on this paper