ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.12602
  4. Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
    ViT
ArXivPDFHTML

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 719 papers shown
Title
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
Haoyi Zhu
Honghui Yang
Xiaoyang Wu
Di Huang
Sha Zhang
...
Hengshuang Zhao
Chunhua Shen
Yu Qiao
Tong He
Wanli Ouyang
SSL
77
43
0
12 Oct 2023
Boundary Discretization and Reliable Classification Network for Temporal
  Action Detection
Boundary Discretization and Reliable Classification Network for Temporal Action Detection
Zhenying Fang
Jun Yu
Richang Hong
30
0
0
10 Oct 2023
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
  Learning
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning
Yinda Chen
Wei Huang
Shenglong Zhou
Qi Chen
Zhiwei Xiong
36
25
0
06 Oct 2023
Diffusion Models as Masked Audio-Video Learners
Diffusion Models as Masked Audio-Video Learners
Elvis Nunez
Yanzi Jin
Mohammad Rastegari
Sachin Mehta
Maxwell Horton
25
2
0
05 Oct 2023
Reinforcement Learning-based Mixture of Vision Transformers for Video
  Violence Recognition
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition
Hamid Reza Mohammadi
Ehsan Nazerfard
Tahereh Firoozi
ViT
27
2
0
04 Oct 2023
Multiple Physics Pretraining for Physical Surrogate Models
Multiple Physics Pretraining for Physical Surrogate Models
Michael McCabe
Bruno Régaldo-Saint Blancard
Liam Parker
Ruben Ohana
M. Cranmer
...
Francois Lanusse
Mariel Pettee
Tiberiu Teşileanu
Kyunghyun Cho
Shirley Ho
PINN
AI4CE
42
54
0
04 Oct 2023
A Spatio-Temporal Attention-Based Method for Detecting Student Classroom
  Behaviors
A Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors
Fan Yang
35
2
0
04 Oct 2023
How Physics and Background Attributes Impact Video Transformers in
  Robotic Manipulation: A Case Study on Planar Pushing
How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing
Shutong Jin
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
38
1
0
03 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to
  Video
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
37
8
0
02 Oct 2023
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Vincent Leroy
Jérôme Revaud
Thomas Lucas
Philippe Weinzaepfel
ViT
42
2
0
01 Oct 2023
SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time
  Echocardiograms with Self- and Weakly-Supervised Learning
SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time Echocardiograms with Self- and Weakly-Supervised Learning
F. Maani
Asim Ukaye
Nada Saadi
Numan Saeed
Mohammad Yaqub
218
1
0
30 Sep 2023
Towards Free Data Selection with General-Purpose Models
Towards Free Data Selection with General-Purpose Models
Alessandro Mutti
Mingyu Ding
Patrizia Semeraro
Wei Zhan
42
9
0
29 Sep 2023
CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image
  Understanding
CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding
Mingming Zhang
Qingjie Liu
Yunhong Wang
37
5
0
28 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
41
15
0
28 Sep 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction
  Tuning
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
25
29
0
27 Sep 2023
M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for
  2D image and video understanding
M3^{3}33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
26
1
0
26 Sep 2023
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object
  Interactions in Industrial Scenarios
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios
Francesco Ragusa
Rosario Leonardi
Michele Mazzamuto
Claudia Bonanno
Rosario Scavo
Antonino Furnari
G. Farinella
37
7
0
26 Sep 2023
IBVC: Interpolation-driven B-frame Video Compression
IBVC: Interpolation-driven B-frame Video Compression
Chenming Xu
Meiqin Liu
Chao Yao
Weisi Lin
Yao Zhao
57
8
0
25 Sep 2023
Towards Answering Health-related Questions from Medical Videos: Datasets
  and Approaches
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
Deepak Gupta
Kush Attal
Dina Demner-Fushman
LM&MA
27
1
0
21 Sep 2023
AI Foundation Models for Weather and Climate: Applications, Design, and
  Implementation
AI Foundation Models for Weather and Climate: Applications, Design, and Implementation
S. K. Mukkavilli
Daniel Salles Civitarese
J. Schmude
Johannes Jakubik
Anne Jones
...
R. Ganti
Hendrik Hamann
U. Nair
Rahul Ramachandran
Kommy Weldemariam
AI4Cl
AI4CE
37
18
0
19 Sep 2023
FoleyGen: Visually-Guided Audio Generation
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
29
21
0
19 Sep 2023
Unsupervised Open-Vocabulary Object Localization in Videos
Unsupervised Open-Vocabulary Object Localization in Videos
Ke Fan
Zechen Bai
Tianjun Xiao
Dominik Zietlow
Max Horn
...
Bernt Schiele
Thomas Brox
Zheng-Wei Zhang
Yanwei Fu
Tong He
53
9
0
18 Sep 2023
FrameRS: A Video Frame Compression Model Composed by Self supervised
  Video Frame Reconstructor and Key Frame Selector
FrameRS: A Video Frame Compression Model Composed by Self supervised Video Frame Reconstructor and Key Frame Selector
Qiqian Fu
Guanhong Wang
Gaoang Wang
30
0
0
16 Sep 2023
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal
  Spatial-Temporal Vision Transformer
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer
Fudong Lin
Summer Crawford
Kaleb Guillot
Yihe Zhang
Yan Chen
...
Tri Setiyono
B. Tubana
Lu Peng
Magdy A. Bayoumi
N. Tzeng
47
20
0
16 Sep 2023
RMP: A Random Mask Pretrain Framework for Motion Prediction
RMP: A Random Mask Pretrain Framework for Motion Prediction
Yi Yang
Qingwen Zhang
Thomas Gilles
Nazre Batool
John Folkesson
61
5
0
16 Sep 2023
AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual
  Masked Autoencoder
AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder
Xingjian Diao
Ming Cheng
Shitong Cheng
VGen
32
8
0
15 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
  Transfer Learning
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
37
18
0
14 Sep 2023
SCD-Net: Spatiotemporal Clues Disentanglement Network for
  Self-supervised Skeleton-based Action Recognition
SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition
Cong Wu
Xiaojun Wu
Josef Kittler
Tianyang Xu
Sara Atito
Muhammad Awais
Zhenhua Feng
43
3
0
11 Sep 2023
CDFSL-V: Cross-Domain Few-Shot Learning for Videos
CDFSL-V: Cross-Domain Few-Shot Learning for Videos
Sarinda Samarasinghe
Mamshad Nayeem Rizve
Navid Kardan
M. Shah
27
11
0
07 Sep 2023
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action
  Spotting using Transformers
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers
J. Denize
Mykola Liashuha
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
ViT
30
13
0
03 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image
  Modeling
RevColV2: Exploring Disentangled Representations in Masked Image Modeling
Qi Han
Yuxuan Cai
Xiangyu Zhang
43
7
0
02 Sep 2023
Self-Supervised Video Transformers for Isolated Sign Language
  Recognition
Self-Supervised Video Transformers for Isolated Sign Language Recognition
Marcelo Sandoval-Castaneda
Yanhong Li
D. Brentari
Karen Livescu
Gregory Shakhnarovich
SLR
25
3
0
02 Sep 2023
CL-MAE: Curriculum-Learned Masked Autoencoders
CL-MAE: Curriculum-Learned Masked Autoencoders
Neelu Madan
Nicolae-Cătălin Ristea
Kamal Nasrollahi
T. Moeslund
Radu Tudor Ionescu
26
10
0
31 Aug 2023
IndGIC: Supervised Action Recognition under Low Illumination
IndGIC: Supervised Action Recognition under Low Illumination
Jing-Teng Zeng
35
1
0
29 Aug 2023
CEFHRI: A Communication Efficient Federated Learning Framework for
  Recognizing Industrial Human-Robot Interaction
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction
Umar Khalid
Hasan Iqbal
Saeed Vahidian
Jing Hua
Chong Chen
21
3
0
29 Aug 2023
Self-Supervision for Tackling Unsupervised Anomaly Detection: Pitfalls
  and Opportunities
Self-Supervision for Tackling Unsupervised Anomaly Detection: Pitfalls and Opportunities
Leman Akoglu
Jaemin Yoo
43
1
0
28 Aug 2023
EventTransAct: A video transformer-based framework for Event-camera
  based action recognition
EventTransAct: A video transformer-based framework for Event-camera based action recognition
Tristan de Blegiers
I. Dave
Adeel Yousaf
M. Shah
ViT
46
9
0
25 Aug 2023
Attending Generalizability in Course of Deep Fake Detection by Exploring
  Multi-task Learning
Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning
P. Balaji
Abhijit Das
Srijan Das
A. Dantcheva
CVBM
21
4
0
25 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning
Motion-Guided Masking for Spatiotemporal Representation Learning
D. Fan
Jue Wang
Shuai Liao
Yi Zhu
Vimal Bhat
H. Santos-Villalobos
M. Rohith
Xinyu Li
VGen
37
19
0
24 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding
MOFO: MOtion FOcused Self-Supervision for Video Understanding
Mona Ahmadian
Frank Guerin
Andrew Gilbert
44
2
0
23 Aug 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised
  RGB2Depth Adaptation
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation
Hejun Xiao
Kunyu Peng
Xiangsheng Huang
Alina Roitberg
Hao Li
Zhao Wang
Rainer Stiefelhagen
31
3
0
23 Aug 2023
Audio-Visual Class-Incremental Learning
Audio-Visual Class-Incremental Learning
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
35
28
0
21 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
44
30
0
21 Aug 2023
Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via
  Recovering Faces and Mapping Recovered Faces
Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces
Juan Hu
Xin Liao
Difei Gao
Satoshi Tsutsui
Qian Wang
Zheng Qin
Mike Zheng Shou
CVBM
AAML
27
1
0
19 Aug 2023
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning
  on Point Cloud Videos
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos
Zhiqiang Shen
Xiaoxiao Sheng
Hehe Fan
Longguang Wang
Y. Guo
Qiong Liu
Hao-Kai Wen
Xiaoping Zhou
3DPC
20
14
0
18 Aug 2023
Learning to In-paint: Domain Adaptive Shape Completion for 3D Organ
  Segmentation
Learning to In-paint: Domain Adaptive Shape Completion for 3D Organ Segmentation
Mingjin Chen
Yongkang He
Yongyi Lu
Zhi-Yi Yang
MedIm
23
1
0
17 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding
Memory-and-Anticipation Transformer for Online Action Understanding
Jiahao Wang
Guo Chen
Yifei Huang
Liming Wang
Tong Lu
OffRL
62
37
0
15 Aug 2023
A Unified Masked Autoencoder with Patchified Skeletons for Motion
  Synthesis
A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis
Esteve Valls Mascaro
Hyemin Ahn
Dongheui Lee
CVBM
42
4
0
14 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
46
9
0
10 Aug 2023
Spatio-Temporal Encoding of Brain Dynamics with Surface Masked
  Autoencoders
Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders
Simon Dahan
Logan Z. J. Williams
Yourong Guo
Daniel Rueckert
E. C. Robinson
46
0
0
10 Aug 2023
Previous
123...91011...131415
Next