VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 719 papers shown

Title
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm Haoyi Zhu Honghui Yang Xiaoyang Wu Di Huang Sha Zhang ... Hengshuang Zhao Chunhua Shen Yu Qiao Tong He Wanli Ouyang SSL 77 43 0 12 Oct 2023
Boundary Discretization and Reliable Classification Network for Temporal Action Detection Zhenying Fang Jun Yu Richang Hong 30 0 0 10 Oct 2023
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning Yinda Chen Wei Huang Shenglong Zhou Qi Chen Zhiwei Xiong 36 25 0 06 Oct 2023
Diffusion Models as Masked Audio-Video Learners Elvis Nunez Yanzi Jin Mohammad Rastegari Sachin Mehta Maxwell Horton 25 2 0 05 Oct 2023
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition Hamid Reza Mohammadi Ehsan Nazerfard Tahereh Firoozi ViT 27 2 0 04 Oct 2023
Multiple Physics Pretraining for Physical Surrogate Models Michael McCabe Bruno Régaldo-Saint Blancard Liam Parker Ruben Ohana M. Cranmer ... Francois Lanusse Mariel Pettee Tiberiu Teşileanu Kyunghyun Cho Shirley Ho PINN AI4CE 42 54 0 04 Oct 2023
A Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors Fan Yang 35 2 0 04 Oct 2023
How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing Shutong Jin Ruiyu Wang Muhammad Zahid Florian T. Pokorny 38 1 0 03 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video Xinhao Li Yuhan Zhu Limin Wang VLM 37 8 0 02 Oct 2023
Win-Win: Training High-Resolution Vision Transformers from Two Windows Vincent Leroy Jérôme Revaud Thomas Lucas Philippe Weinzaepfel ViT 42 2 0 01 Oct 2023
SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time Echocardiograms with Self- and Weakly-Supervised Learning F. Maani Asim Ukaye Nada Saadi Numan Saeed Mohammad Yaqub 218 1 0 30 Sep 2023
Towards Free Data Selection with General-Purpose Models Alessandro Mutti Mingyu Ding Patrizia Semeraro Wei Zhan 42 9 0 29 Sep 2023
CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding Mingming Zhang Qingjie Liu Yunhong Wang 37 5 0 28 Sep 2023
Training a Large Video Model on a Single Machine in a Day Yue Zhao Philipp Krahenbuhl VLM 41 15 0 28 Sep 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning Ruyang Liu Chen Li Yixiao Ge Ying Shan Thomas H. Li Ge Li 25 29 0 27 Sep 2023
$M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding$ M $^{3}$ 3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding Muhammad Abdullah Jamal Omid Mohareri 3DPC 26 1 0 26 Sep 2023
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios Francesco Ragusa Rosario Leonardi Michele Mazzamuto Claudia Bonanno Rosario Scavo Antonino Furnari G. Farinella 37 7 0 26 Sep 2023
IBVC: Interpolation-driven B-frame Video Compression Chenming Xu Meiqin Liu Chao Yao Weisi Lin Yao Zhao 57 8 0 25 Sep 2023
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches Deepak Gupta Kush Attal Dina Demner-Fushman LM&MA 27 1 0 21 Sep 2023
AI Foundation Models for Weather and Climate: Applications, Design, and Implementation S. K. Mukkavilli Daniel Salles Civitarese J. Schmude Johannes Jakubik Anne Jones ... R. Ganti Hendrik Hamann U. Nair Rahul Ramachandran Kommy Weldemariam AI4Cl AI4CE 37 18 0 19 Sep 2023
FoleyGen: Visually-Guided Audio Generation Xinhao Mei Varun K. Nagaraja Gaël Le Lan Zhaoheng Ni Ernie Chang Yangyang Shi Vikas Chandra VGen 29 21 0 19 Sep 2023
Unsupervised Open-Vocabulary Object Localization in Videos Ke Fan Zechen Bai Tianjun Xiao Dominik Zietlow Max Horn ... Bernt Schiele Thomas Brox Zheng-Wei Zhang Yanwei Fu Tong He 53 9 0 18 Sep 2023
FrameRS: A Video Frame Compression Model Composed by Self supervised Video Frame Reconstructor and Key Frame Selector Qiqian Fu Guanhong Wang Gaoang Wang 30 0 0 16 Sep 2023
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer Fudong Lin Summer Crawford Kaleb Guillot Yihe Zhang Yan Chen ... Tri Setiyono B. Tubana Lu Peng Magdy A. Bayoumi N. Tzeng 47 20 0 16 Sep 2023
RMP: A Random Mask Pretrain Framework for Motion Prediction Yi Yang Qingwen Zhang Thomas Gilles Nazre Batool John Folkesson 61 5 0 16 Sep 2023
AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder Xingjian Diao Ming Cheng Shitong Cheng VGen 32 8 0 15 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning Zhiwu Qing Shiwei Zhang Ziyuan Huang Yingya Zhang Changxin Gao Deli Zhao Nong Sang 37 18 0 14 Sep 2023
SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition Cong Wu Xiaojun Wu Josef Kittler Tianyang Xu Sara Atito Muhammad Awais Zhenhua Feng 43 3 0 11 Sep 2023
CDFSL-V: Cross-Domain Few-Shot Learning for Videos Sarinda Samarasinghe Mamshad Nayeem Rizve Navid Kardan M. Shah 27 11 0 07 Sep 2023
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers J. Denize Mykola Liashuha Jaonary Rabarisoa Astrid Orcesi Romain Hérault ViT 30 13 0 03 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image Modeling Qi Han Yuxuan Cai Xiangyu Zhang 43 7 0 02 Sep 2023
Self-Supervised Video Transformers for Isolated Sign Language Recognition Marcelo Sandoval-Castaneda Yanhong Li D. Brentari Karen Livescu Gregory Shakhnarovich SLR 25 3 0 02 Sep 2023
CL-MAE: Curriculum-Learned Masked Autoencoders Neelu Madan Nicolae-Cătălin Ristea Kamal Nasrollahi T. Moeslund Radu Tudor Ionescu 26 10 0 31 Aug 2023
IndGIC: Supervised Action Recognition under Low Illumination Jing-Teng Zeng 35 1 0 29 Aug 2023
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction Umar Khalid Hasan Iqbal Saeed Vahidian Jing Hua Chong Chen 21 3 0 29 Aug 2023
Self-Supervision for Tackling Unsupervised Anomaly Detection: Pitfalls and Opportunities Leman Akoglu Jaemin Yoo 43 1 0 28 Aug 2023
EventTransAct: A video transformer-based framework for Event-camera based action recognition Tristan de Blegiers I. Dave Adeel Yousaf M. Shah ViT 46 9 0 25 Aug 2023
Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning P. Balaji Abhijit Das Srijan Das A. Dantcheva CVBM 21 4 0 25 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning D. Fan Jue Wang Shuai Liao Yi Zhu Vimal Bhat H. Santos-Villalobos M. Rohith Xinyu Li VGen 37 19 0 24 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding Mona Ahmadian Frank Guerin Andrew Gilbert 44 2 0 23 Aug 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation Hejun Xiao Kunyu Peng Xiangsheng Huang Alina Roitberg Hao Li Zhao Wang Rainer Stiefelhagen 31 3 0 23 Aug 2023
Audio-Visual Class-Incremental Learning Weiguo Pian Shentong Mo Yunhui Guo Yapeng Tian CLL VLM 35 28 0 21 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding Bingkun Huang Zhiyu Zhao Guozhen Zhang Yu Qiao Limin Wang 44 30 0 21 Aug 2023
Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces Juan Hu Xin Liao Difei Gao Satoshi Tsutsui Qian Wang Zheng Qin Mike Zheng Shou CVBM AAML 27 1 0 19 Aug 2023
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos Zhiqiang Shen Xiaoxiao Sheng Hehe Fan Longguang Wang Y. Guo Qiong Liu Hao-Kai Wen Xiaoping Zhou 3DPC 20 14 0 18 Aug 2023
Learning to In-paint: Domain Adaptive Shape Completion for 3D Organ Segmentation Mingjin Chen Yongkang He Yongyi Lu Zhi-Yi Yang MedIm 23 1 0 17 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding Jiahao Wang Guo Chen Yifei Huang Liming Wang Tong Lu OffRL 62 37 0 15 Aug 2023
A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis Esteve Valls Mascaro Hyemin Ahn Dongheui Lee CVBM 42 4 0 14 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding Ziyuan Huang Shiwei Zhang Liang Pan Zhiwu Qing Yingya Zhang Ziwei Liu Marcelo H. Ang 46 9 0 10 Aug 2023
Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders Simon Dahan Logan Z. J. Williams Yourong Guo Daniel Rueckert E. C. Robinson 46 0 0 10 Aug 2023