MTIL: Encoding Full History with Mamba for Temporal Imitation Learning

Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, limiting their applicability to tasks where historical context is crucial for disambiguating current observations. This limitation hinders performance in long-horizon sequential manipulation tasks where the correct action depends on past events not fully captured by the current state. To address this fundamental challenge, we introduce Mamba Temporal Imitation Learning (MTIL), a novel approach that leverages the recurrent state dynamics inherent in State Space Models (SSMs), specifically the Mamba architecture. MTIL encodes the entire trajectory history into a compressed hidden state, conditioning action predictions on this comprehensive temporal context alongside current multi-modal observations. Through extensive experiments on simulated benchmarks (ACT dataset tasks, Robomimic, LIBERO) and real-world sequential manipulation tasks specifically designed to probe temporal dependencies, MTIL significantly outperforms state-of-the-art methods like ACT and Diffusion Policy. Our findings affirm the necessity of full temporal context for robust sequential decision-making and validate MTIL as a powerful approach that transcends the inherent limitations of Markovian imitation learning
View on arXiv@article{zhou2025_2505.12410, title={ MTIL: Encoding Full History with Mamba for Temporal Imitation Learning }, author={ Yulin Zhou and Yuankai Lin and Fanzhe Peng and Jiahui Chen and Zhuang Zhou and Kaiji Huang and Hua Yang and Zhouping Yin }, journal={arXiv preprint arXiv:2505.12410}, year={ 2025 } }