Spatio-Temporal Crop Aggregation for Video Representation Learning

30 November 2022

Papers citing "Spatio-Temporal Crop Aggregation for Video Representation Learning"

9 / 9 papers shown

Title
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning Nikhil Singh Chih-Wei Wu Iroro Orife Mahdi M. Kalayeh 23 2 0 12 Apr 2023
The Curious Case of Benign Memorization Sotiris Anagnostidis Gregor Bachmann Lorenzo Noci Thomas Hofmann AAML 43 8 0 25 Oct 2022
Exploring Target Representations for Masked Autoencoders Xingbin Liu Jinghao Zhou Tao Kong Xianming Lin Rongrong Ji 84 50 0 08 Sep 2022
Time-Equivariant Contrastive Video Representation Learning Simon Jenni Hailin Jin SSL AI4TS 140 60 0 07 Dec 2021
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 305 7,434 0 11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 317 5,775 0 29 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Yin Cui Boqing Gong ViT 248 577 0 22 Apr 2021
Learning to Anticipate Egocentric Actions by Imagination Yu Wu Linchao Zhu Xiaohan Wang Yi Yang Fei Wu EgoV 85 69 0 13 Jan 2021
Self-supervised Co-training for Video Representation Learning Tengda Han Weidi Xie Andrew Zisserman SSL 215 309 0 19 Oct 2020