Robust Video Synchronization using Unsupervised Deep Learning

Aligning video sequences is a fundamental yet still unsolved component for a wide range of applications in computer graphics and vision. Especially when targeting video clips containing an extensively varying appearance. Using recent advances in deep learning, we present a scalable and robust method for computing optimal non-linear temporal video alignments. The presented algorithm learns to retrieve and match similar video frames from input sequences without any human interaction or additional annotations in an unsupervised fashion. An iterative scheme is presented which leverages on the nature of the videos themselves in order to remove the need for labels. We incorporate a variation of Dijkstra's shortest-path algorithm for extracting meaningful training examples as well as a robust video alignment. While previous methods assume similar settings as weather conditions, season and illumination, our approach is able to robustly align videos regardless of such noise. This provides new ways of compositing non-seasonal video clips from data recorded months apart.
View on arXiv