37
4

Disambiguation for Video Frame Interpolation

Zhihang Zhong
Yiming Zhang
Wei Wang
Gurunandan Krishnan
Sizhuo Ma
Jian Wang
Sizhuo Ma
Jian Wang
Abstract

Existing video frame interpolation (VFI) methods blindly predict where each object is at a specific timestep t ("time indexing"), which struggles to predict precise object movements. Given two images of a baseball, there are infinitely many possible trajectories: accelerating or decelerating, straight or curved. This often results in blurry frames as the method averages out these possibilities. Instead of forcing the network to learn this complicated time-to-location mapping implicitly, we provide the network with an explicit hint on how far the object has traveled between start and end frames, a novel approach termed "distance indexing". This method offers a clearer learning goal for models, reducing the uncertainty tied to object speeds. Moreover, even with this extra guidance, objects can still be blurry especially when they are equally far from both input frames, due to the directional ambiguity in long-range motion. To solve this, we propose an iterative reference-based estimation strategy that breaks down a long-range prediction into several short-range steps. When integrating our plug-and-play strategies into state-of-the-art learning-based models, they exhibit markedly superior perceptual quality in arbitrary time interpolations, using a uniform distance indexing map in the same format as time indexing without requiring extra computation. Furthermore, we demonstrate that if additional latency is acceptable, a continuous map estimator can be employed to compute a pixel-wise dense distance indexing using multiple nearby frames. Combined with efficient multi-frame refinement, this extension can further disambiguate complex motion, thus enhancing performance both qualitatively and quantitatively. Additionally, the ability to manually specify distance indexing allows for independent temporal manipulation of each object, providing a novel tool for video editing tasks such as re-timing.

View on arXiv
@article{zhong2025_2311.08007,
  title={ Disambiguation for Video Frame Interpolation },
  author={ Zhihang Zhong and Yiming Zhang and Wei Wang and Xiao Sun and Yu Qiao and Gurunandan Krishnan and Sizhuo Ma and Jian Wang },
  journal={arXiv preprint arXiv:2311.08007},
  year={ 2025 }
}
Comments on this paper