ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.12602
  4. Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
    ViT
ArXivPDFHTML

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 719 papers shown
Title
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical
  Flow Estimation
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
Xiaoyu Shi
Zhaoyang Huang
Dasong Li
Manyuan Zhang
Ka Chun Cheung
Simon See
Hongwei Qin
Jifeng Dai
Hongsheng Li
27
82
0
02 Mar 2023
Valid Information Guidance Network for Compressed Video Quality
  Enhancement
Valid Information Guidance Network for Compressed Video Quality Enhancement
Xuan Sun
Ziyue Zhang
Guannan Chen
Dan Zhu
45
0
0
28 Feb 2023
Remote Sensing Scene Classification with Masked Image Modeling (MIM)
Remote Sensing Scene Classification with Masked Image Modeling (MIM)
Liya Wang
A. Tien
35
3
0
28 Feb 2023
Language-Driven Representation Learning for Robotics
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Percy Liang
LM&Ro
SSL
47
145
0
24 Feb 2023
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias
Bowen Zhao
Chen Chen
Qian-Wei Wang
Anfeng He
Shutao Xia
42
1
0
22 Feb 2023
Towards Efficient Visual Adaption via Structural Re-parameterization
Towards Efficient Visual Adaption via Structural Re-parameterization
Gen Luo
Minglang Huang
Yiyi Zhou
Xiaoshuai Sun
Guannan Jiang
Zhiyu Wang
Rongrong Ji
VLM
VPVLM
14
78
0
16 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
31
7
0
16 Feb 2023
CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
Jiang Yang
Sheng Guo
Gangshan Wu
Limin Wang
VLM
31
7
0
13 Feb 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
Chong Chen
Mu Li
ViT
58
144
0
06 Feb 2023
Multi-View Masked World Models for Visual Robotic Manipulation
Multi-View Masked World Models for Visual Robotic Manipulation
Younggyo Seo
Junsup Kim
Stephen James
Kimin Lee
Jinwoo Shin
Pieter Abbeel
VGen
25
56
0
05 Feb 2023
Representation Deficiency in Masked Language Modeling
Representation Deficiency in Masked Language Modeling
Yu Meng
Jitin Krishnan
Sinong Wang
Qifan Wang
Yuning Mao
Han Fang
Marjan Ghazvininejad
Jiawei Han
Luke Zettlemoyer
90
7
0
04 Feb 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Liya Wang
A. Tien
49
7
0
28 Jan 2023
Summarize the Past to Predict the Future: Natural Language Descriptions
  of Context Boost Multimodal Object Interaction Anticipation
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
Razvan-George Pasca
Alexey Gavryushin
Muhammad Hamza
Yen-Ling Kuo
Kaichun Mo
Luc Van Gool
Otmar Hilliges
Xi Wang
33
14
0
22 Jan 2023
Ti-MAE: Self-Supervised Masked Time Series Autoencoders
Ti-MAE: Self-Supervised Masked Time Series Autoencoders
Zhe Li
Zhongwen Rao
Lujia Pan
Pengyun Wang
Zenglin Xu
AI4TS
31
49
0
21 Jan 2023
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
Cheng Lu
Xiaojie Jin
Zhicheng Huang
Qibin Hou
Mingg-Ming Cheng
Jiashi Feng
37
8
0
15 Jan 2023
A Survey on Self-supervised Learning: Algorithms, Applications, and
  Future Trends
A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Jie Gui
Tuo Chen
Jing Zhang
Qiong Cao
Zhe Sun
Haoran Luo
Dacheng Tao
31
126
0
13 Jan 2023
STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition
STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition
Ming Li
Xiangyu Xu
Hehe Fan
Pan Zhou
Jun Liu
Jia-Wei Liu
Jiahe Li
Jussi Keppo
Mike Zheng Shou
Shuicheng Yan
ViT
PICV
51
13
0
08 Jan 2023
Ego-Only: Egocentric Action Detection without Exocentric Transferring
Ego-Only: Egocentric Action Detection without Exocentric Transferring
Huiyu Wang
Mitesh Singh
Lorenzo Torresani
EgoV
72
24
0
03 Jan 2023
Ponder: Point Cloud Pre-training via Neural Rendering
Ponder: Point Cloud Pre-training via Neural Rendering
Di Huang
Sida Peng
Tong He
Honghui Yang
Xiaowei Zhou
Wanli Ouyang
SSL
3DPC
39
41
0
31 Dec 2022
Transformers in Action Recognition: A Review on Temporal Modeling
Transformers in Action Recognition: A Review on Temporal Modeling
Elham Shabaninia
Hossein Nezamabadi-pour
Fatemeh Shafizadegan
ViT
29
8
0
29 Dec 2022
Similarity Contrastive Estimation for Image and Video Soft Contrastive
  Self-Supervised Learning
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
J. Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
SSL
19
6
0
21 Dec 2022
Randomized Quantization: A Generic Augmentation for Data Agnostic
  Self-supervised Learning
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
Huimin Wu
Chenyang Lei
Xiao Sun
Pengju Wang
Qifeng Chen
Kwang-Ting Cheng
Stephen Lin
Zhirong Wu
MQ
38
5
0
19 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
34
73
0
15 Dec 2022
Policy Adaptation from Foundation Model Feedback
Policy Adaptation from Foundation Model Feedback
Yuying Ge
Annabella Macaluso
Erran L. Li
Ping Luo
Xiaolong Wang
LM&Ro
27
12
0
14 Dec 2022
THMA: Tencent HD Map AI System for Creating HD Map Annotations
THMA: Tencent HD Map AI System for Creating HD Map Annotations
Kun Tang
Xu Cao
Zhipeng Cao
Tongxi Zhou
Erlong Li
...
Shengtao Zou
Chang-ling Liu
Shuqi Mei
Elena Sizikova
Chao Zheng
25
12
0
14 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
45
49
0
12 Dec 2022
Recurrent Vision Transformers for Object Detection with Event Cameras
Recurrent Vision Transformers for Object Detection with Event Cameras
Mathias Gehrig
Davide Scaramuzza
40
122
0
11 Dec 2022
VindLU: A Recipe for Effective Video-and-Language Pretraining
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
35
79
0
09 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
39
43
0
09 Dec 2022
Deep Architectures for Content Moderation and Movie Content Rating
Deep Architectures for Content Moderation and Movie Content Rating
Fatih Çagatay Akyön
A. Temi̇zel
38
4
0
08 Dec 2022
Masked Video Distillation: Rethinking Masked Feature Modeling for
  Self-supervised Video Representation Learning
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu-Gang Jiang
VGen
32
87
0
08 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers
  using Synthetic Scene Data
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
41
16
0
08 Dec 2022
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
Yue Ma
Tianyu Yang
Yin Shan
Xiu Li
41
27
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
38
54
0
06 Dec 2022
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
57
311
0
06 Dec 2022
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds
Honghui Yang
Tong He
Jiaheng Liu
Huaguan Chen
Boxi Wu
Binbin Lin
Xiaofei He
Wanli Ouyang
54
58
0
06 Dec 2022
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu
Biaolong Chen
Yue Liao
Shuwen Xiao
Wenyu Sun
Xiaobo Li
Yousong Zhu
Jinqiao Wang
Si Liu
CLIP
32
11
0
02 Dec 2022
Scaling Language-Image Pre-training via Masking
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
30
318
0
01 Dec 2022
Spatio-Temporal Crop Aggregation for Video Representation Learning
Spatio-Temporal Crop Aggregation for Video Representation Learning
Sepehr Sameni
Simon Jenni
Paolo Favaro
24
3
0
30 Nov 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video
  Representation Learning
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
19
20
0
25 Nov 2022
SVFormer: Semi-supervised Video Transformer for Action Recognition
SVFormer: Semi-supervised Video Transformer for Action Recognition
Zhen Xing
Qi Dai
Hang-Rui Hu
Jingjing Chen
Zuxuan Wu
Yu-Gang Jiang
ViT
33
69
0
23 Nov 2022
Mitigating and Evaluating Static Bias of Action Representations in the
  Background and the Foreground
Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground
Haoxin Li
Yuan Liu
Hanwang Zhang
Boyang Li
30
15
0
23 Nov 2022
Tell Me What Happened: Unifying Text-guided Video Completion via
  Multimodal Masked Video Generation
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
VGen
61
37
0
23 Nov 2022
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language
  Pre-training
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
34
15
0
21 Nov 2022
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant
  Spatiotemporal Tokens
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
Sun-Kyoo Hwang
Jaehong Yoon
Youngwan Lee
Sung Ju Hwang
31
6
0
19 Nov 2022
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge
  2022
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022
Jiachen Lei
Shuang Ma
Zhongjie Ba
Sai H. Vemprala
Ashish Kapoor
Kui Ren
EgoV
12
0
0
18 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video
  UniFormer
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
30
107
0
17 Nov 2022
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Guo Chen
Sen Xing
Zhe Chen
Yi Wang
Kunchang Li
...
Hongjie Zhang
Tong Lu
Yali Wang
Liming Wang
Yu Qiao
41
46
0
17 Nov 2022
Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @
  Ego4d Looking at me Challenge
Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @ Ego4d Looking at me Challenge
Yinan He
Guo Chen
14
0
0
17 Nov 2022
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with
  Masked Autoencoders
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
W. G. C. Bandara
Naman Patel
A. Gholami
Mehdi Nikkhah
M. Agrawal
Vishal M. Patel
25
39
0
16 Nov 2022
Previous
123...12131415
Next