ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.12602
  4. Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
    ViT
ArXivPDFHTML

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 719 papers shown
Title
Membership Inference Attack Against Masked Image Modeling
Membership Inference Attack Against Masked Image Modeling
Zehan Li
Xinlei He
Ning Yu
Yang Zhang
42
1
0
13 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
72
6
0
13 Aug 2024
Deep Multimodal Collaborative Learning for Polyp Re-Identification
Deep Multimodal Collaborative Learning for Polyp Re-Identification
Suncheng Xiang
Jincheng Li
Zhengjie Zhang
Shilun Cai
Jiale Guan
Dahong Qian
33
0
0
12 Aug 2024
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
Rex Liu
Xin Liu
40
1
0
08 Aug 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context
  Relation Modeling
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Seok Hwan Lee
Taein Son
Soo Won Seo
Jisong Kim
Jun Won Choi
52
0
0
07 Aug 2024
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture
  Generation
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation
Xiaofeng Mao
Zhengkai Jiang
Qilin Wang
Chencan Fu
Jiangning Zhang
Jiafu Wu
Yabiao Wang
Chengjie Wang
Wei Li
Mingmin Chi
80
4
0
06 Aug 2024
From Recognition to Prediction: Leveraging Sequence Reasoning for Action
  Anticipation
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation
Xin Liu
Chao Hao
Zitong Yu
Huanjing Yue
Jingyu Yang
41
1
0
05 Aug 2024
Past Movements-Guided Motion Representation Learning for Human Motion
  Prediction
Past Movements-Guided Motion Representation Learning for Human Motion Prediction
Junyu Shi
Baoxuan Wang
3DH
39
0
0
04 Aug 2024
Text-Guided Video Masked Autoencoder
Text-Guided Video Masked Autoencoder
D. Fan
Jue Wang
Shuai Liao
Zhikang Zhang
Vimal Bhat
Xinyu Li
VGen
36
3
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
46
5
0
31 Jul 2024
Hyper-parameter tuning for text guided image editing
Hyper-parameter tuning for text guided image editing
Shiwen Zhang
DiffM
45
0
0
31 Jul 2024
Mitral Regurgitation Recogniton based on Unsupervised
  Out-of-Distribution Detection with Residual Diffusion Amplification
Mitral Regurgitation Recogniton based on Unsupervised Out-of-Distribution Detection with Residual Diffusion Amplification
Zhe Liu
Xiliang Zhu
Tong Han
Yuhao Huang
Jian Wang
M. Werman
Fang Wang
Dong Ni
Zhongshan Gou
Xin Yang
52
0
0
31 Jul 2024
PersonalityScanner: Exploring the Validity of Personality Assessment
  Based on Multimodal Signals in Virtual Reality
PersonalityScanner: Exploring the Validity of Personality Assessment Based on Multimodal Signals in Virtual Reality
Xintong Zhang
Di Lu
Huiqi Hu
Nan Jiang
Xianhao Yu
Jinan Xu
Yujia Peng
Qing Li
Wenjuan Han
36
1
0
29 Jul 2024
Classification Matters: Improving Video Action Detection with
  Class-Specific Attention
Classification Matters: Improving Video Action Detection with Class-Specific Attention
Jinsung Lee
Taeoh Kim
Inwoong Lee
Minho Shim
Dongyoon Wee
Minsu Cho
Suha Kwak
54
0
0
29 Jul 2024
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
Pulkit Kumar
Namitha Padmanabhan
Luke Luo
Sai Saketh Rambhatla
Abhinav Shrivastava
45
4
0
25 Jul 2024
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey
  Interactions in Animal Videos
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos
Zsófia Katona
Seyed Sahand Mohamadi Ziabari
F. Karimi Nejadasl
30
0
0
25 Jul 2024
OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in
  Videos
OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
Andrew Zisserman
34
2
0
24 Jul 2024
QPT V2: Masked Image Modeling Advances Visual Scoring
QPT V2: Masked Image Modeling Advances Visual Scoring
Qizhi Xie
Kun Yuan
Yunpeng Qu
Mingda Wu
Ming Sun
Chao Zhou
Jihong Zhu
42
3
0
23 Jul 2024
Probing Fine-Grained Action Understanding and Cross-View Generalization
  of Foundation Models
Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models
Thinesh Thiyakesan Ponbagavathi
Kunyu Peng
Alina Roitberg
45
1
0
22 Jul 2024
SIGMA:Sinkhorn-Guided Masked Video Modeling
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
55
3
0
22 Jul 2024
Towards Robust Vision Transformer via Masked Adaptive Ensemble
Towards Robust Vision Transformer via Masked Adaptive Ensemble
Fudong Lin
Jiadong Lou
Xu Yuan
Nianfeng Tzeng
ViT
AAML
36
1
0
22 Jul 2024
CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density
  Forecasting
CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting
Ryoske Fujii
Ryo Hachiuma
Hideo Saito
40
1
0
20 Jul 2024
A Comprehensive Review of Few-shot Action Recognition
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
80
3
0
20 Jul 2024
Dyn-Adapter: Towards Disentangled Representation for Efficient Visual
  Recognition
Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition
Yurong Zhang
Honghao Chen
Xinyu Zhang
Xiangxiang Chu
Li Song
47
1
0
19 Jul 2024
Self-Supervised Video Representation Learning in a Heuristic Decoupled
  Perspective
Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective
Changwen Zheng
Wenwen Qiang
Jianqi Zhang
Changwen Zheng
Jingyao Wang
SSL
66
0
0
19 Jul 2024
Rethinking Video-Text Understanding: Retrieval from Counterfactually
  Augmented Data
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma
Kai Li
Zhongshi Jiang
Moustafa Meshry
Qihao Liu
Huiyu Wang
Christian Hane
Alan L. Yuille
VGen
42
1
0
18 Jul 2024
Towards Understanding Unsafe Video Generation
Towards Understanding Unsafe Video Generation
Yan Pang
Aiping Xiong
Yang Zhang
Tianhao Wang
EGVM
34
2
0
17 Jul 2024
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language
  Large Models
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju
Haicheng Wang
Haozhe Cheng
Xu Chen
Zhonghua Zhai
Weilin Huang
Jinsong Lan
Shuai Xiao
Bo Zheng
VLM
49
5
0
16 Jul 2024
AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked
  Autoencoder
AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder
Qiaoqiao Jin
Rui Shi
Yishun Dou
Bingbing Ni
CVBM
53
0
0
16 Jul 2024
Learning Natural Consistency Representation for Face Forgery Video
  Detection
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang
Zihao Xiao
Shikun Li
Fanzhao Lin
Jianmin Li
Shiming Ge
CVBM
43
10
0
15 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
42
7
0
11 Jul 2024
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang
Junliang Guo
Tianyu He
Li Zhao
Linli Xu
Jiang Bian
47
3
0
10 Jul 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
52
1
0
09 Jul 2024
Masked Video and Body-worn IMU Autoencoder for Egocentric Action
  Recognition
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Mingfang Zhang
Yifei Huang
Ruicong Liu
Yoichi Sato
51
4
0
09 Jul 2024
D-MASTER: Mask Annealed Transformer for Unsupervised Domain Adaptation
  in Breast Cancer Detection from Mammograms
D-MASTER: Mask Annealed Transformer for Unsupervised Domain Adaptation in Breast Cancer Detection from Mammograms
Tajamul Ashraf
K. Rangarajan
Mohit Gambhir
Richa Gabha
Chetan Arora
MedIm
44
1
0
09 Jul 2024
MMAD: Multi-label Micro-Action Detection in Videos
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li
Pengyu Liu
Pengyu Liu
Guoliang Chen
Zhiliang Wu
Hehe Fan
Meng Wang
47
7
0
07 Jul 2024
CBM: Curriculum by Masking
CBM: Curriculum by Masking
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
40
0
0
06 Jul 2024
ZARRIO @ Ego4D Short Term Object Interaction Anticipation Challenge:
  Leveraging Affordances and Attention-based models for STA
ZARRIO @ Ego4D Short Term Object Interaction Anticipation Challenge: Leveraging Affordances and Attention-based models for STA
Lorenzo Mur-Labadia
Ruben Martinez-Cantin
J. Guerrero-Campo
G. Farinella
31
0
0
05 Jul 2024
QueryMamba: A Mamba-Based Encoder-Decoder Architecture with a
  Statistical Verb-Noun Interaction Module for Video Action Forecasting @ Ego4D
  Long-Term Action Anticipation Challenge 2024
QueryMamba: A Mamba-Based Encoder-Decoder Architecture with a Statistical Verb-Noun Interaction Module for Video Action Forecasting @ Ego4D Long-Term Action Anticipation Challenge 2024
Zeyun Zhong
Manuel Martin
Frederik Diederichs
Juergen Beyerer
42
4
0
04 Jul 2024
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self
  Distillation Networks
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Etai Littwin
Omid Saremi
Madhu Advani
Vimal Thilak
Preetum Nakkiran
Chen Huang
Joshua Susskind
44
3
0
03 Jul 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for
  Efficient Video Recognition
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
40
4
0
03 Jul 2024
Mask and Compress: Efficient Skeleton-based Action Recognition in
  Continual Learning
Mask and Compress: Efficient Skeleton-based Action Recognition in Continual Learning
Matteo Mosconi
Andriy Sorokin
Aniello Panariello
Angelo Porrello
Jacopo Bonato
Marco Cotogni
Luigi Sabetta
Simone Calderara
Rita Cucchiara
CLL
42
1
0
01 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description
  Models
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
47
52
0
30 Jun 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal
  Alignment
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
Hao Fei
Tat-Seng Chua
Shuicheng Yan
AI4TS
47
40
0
27 Jun 2024
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation
  Model
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model
Zhuo Zheng
Stefano Ermon
Dongjun Kim
Liangpei Zhang
Yanfei Zhong
DiffM
45
20
0
26 Jun 2024
Video Occupancy Models
Video Occupancy Models
Manan Tomar
Philippe Hansen-Estruch
Philip Bachman
Alex Lamb
John Langford
Matthew E. Taylor
Sergey Levine
51
1
0
25 Jun 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video
  Action Recognition
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
57
4
0
21 Jun 2024
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
Jiali Cheng
Hadi Amiri
BDL
45
3
0
21 Jun 2024
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Md. Saiful Islam
Tariq Adnan
Jan Freyberg
Sangwu Lee
Abdelrahman Abdelkader
...
Cathe Schwartz
Karen Jaffe
Ruth B. Schneider
E. R. Dorsey
Ehsan Hoque
77
0
0
21 Jun 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
52
4
0
20 Jun 2024
Previous
123456...131415
Next