50
8

ContextVP: Fully Context-Aware Video Prediction

Abstract

Video prediction models based on convolutional net- works, recurrent networks, and their combinations often re- sult in blurry predictions. We identify an important con- tributing factor for imprecise predictions that has not been studied adequately in the literature: blind spots, i.e., lack of access to all relevant past information for accurately predicting the future. To address this issue, we introduce a fully context-aware architecture that captures the entire available past context for each pixel using Parallel Multi- Dimensional LSTM units and aggregates it using blending units. Our model outperforms a strong baseline network of 20 recurrent convolutional layers and yields state-of- the-art performance for next step prediction. Moreover, it does so with fewer parameters than several recently pro- posed models, and does not rely on deep convolutional net- works, multi-scale architectures, separation of background and foreground modeling, motion flow learning, or adver- sarial training. These results highlight that full awareness of past context is of crucial importance for video prediction.

View on arXiv
Comments on this paper