Spatiotemporal predictive learning is to predict future frame changes through historical prior knowledge. Previous work improves the performance by making the network wider and deeper, but that also brings huge memory overhead, which seriously hinders the development and application of the technology. Scale is another dimension to improve model performance in common computer vision tasks, which can decrease the computing requirements and better sense context. Such an important dimension has not been considered and explored by recent RNN models. In this paper, learning from the benefit of multi-scale, we propose a general framework named Multi-Scale RNN (MS-RNN) to boost recent RNN models. We verify the MS-RNN framework by exhaustive experiments with 6 popular RNN models (ConvLSTM, TrajGRU, PredRNN, PredRNN++, MIM, and MotionRNN) on 4 different datasets (Moving MNIST, KTH, TaxiBJ, and HKO-7). The results show the efficiency that the RNN models incorporating our framework have much lower memory cost but better performance than before. Our code is released at \url{https://github.com/mazhf/MS-RNN}.
View on arXiv