Top-down Flow Transformer Networks
- ViT
We study the deformation fields of feature maps across convolutional network layers under explicit top-down spatial transformations. We propose top-down flow transformer (TFT) by focusing on three transformations: translation, rotation, and scaling. Flow transformation generators, under controllable parameters, are learned that are able to account for the hidden layer deformations while maintaining the overall consistency across layers. The learned generators capture the underlying feature transformation processes that are independent of the particular training images, demonstrated by a comprehensive study on various datasets including MNIST, shapes, and natural images. Our proposed TFT framework brings insights to and helps the understanding of, the important problem of studying the CNN internal feature representation and transformation under the top-down processes. TFT demonstrates its significant advantage over existing data-driven approaches in building data-independent transformations and it can also be used in applications such as data augmentation and transfer learning.
View on arXiv