ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.12602
  4. Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
    ViT
ArXivPDFHTML

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 719 papers shown
Title
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with
  Instruction Tuning
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Jingdong Sun
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
35
29
0
17 Jun 2024
GameVibe: A Multimodal Affective Game Corpus
GameVibe: A Multimodal Affective Game Corpus
M. Barthet
Maria Kaselimi
Kosmas Pinitas
Konstantinos Makantasis
Antonios Liapis
Georgios N. Yannakakis
32
3
0
17 Jun 2024
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
Samar Khanna
Medhanie Irgau
David B. Lobell
Stefano Ermon
VLM
32
4
0
16 Jun 2024
Self Pre-training with Topology- and Spatiality-aware Masked
  Autoencoders for 3D Medical Image Segmentation
Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation
Pengfei Gu
Yejia Zhang
Huimin Li
Chaoli Wang
Danny Chen
MedIm
66
1
0
15 Jun 2024
Self-Supervised Representation Learning with Spatial-Temporal
  Consistency for Sign Language Recognition
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition
Weichao Zhao
Wengang Zhou
Hezhen Hu
Min Wang
Houqiang Li
SLR
40
2
0
15 Jun 2024
AVR: Synergizing Foundation Models for Audio-Visual Humor Detection
AVR: Synergizing Foundation Models for Audio-Visual Humor Detection
Sarthak Sharma
Orchid Chetia Phukan
Drishti Singh
Arun Balaji Buduru
Rajesh Sharma
38
0
0
15 Jun 2024
LieRE: Generalizing Rotary Position Encodings
LieRE: Generalizing Rotary Position Encodings
Sophie Ostmeier
Brian Axelrod
Michael E. Moseley
Akshay S. Chaudhari
C. Langlotz
33
1
0
14 Jun 2024
Thoracic Surgery Video Analysis for Surgical Phase Recognition
Thoracic Surgery Video Analysis for Surgical Phase Recognition
S. Mateen
Niharika Malvia
Syed Abdul Khader
Danny Wang
Deepti Srinivasan
Chi-Fu Jeffrey Yang
Lana Schumacher
Sandeep Manjanna
23
0
0
13 Jun 2024
Towards Multilingual Audio-Visual Question Answering
Towards Multilingual Audio-Visual Question Answering
Orchid Chetia Phukan
Priyabrata Mallick
Swarup Ranjan Behera
Aalekhya Satya Narayani
Arun Balaji Buduru
Rajesh Sharma
49
0
0
13 Jun 2024
A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing
  pre-training method based on anchor-aware masked autoencoder
A2^{2}2-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder
Lixian Zhang
Yi Zhao
Runmin Dong
Jinxiao Zhang
Shuai Yuan
...
Weijia Li
Wei Liu
Wayne Zhang
Xue Jiang
Haohuan Fu
46
4
0
12 Jun 2024
Labeling Comic Mischief Content in Online Videos with a Multimodal
  Hierarchical-Cross-Attention Model
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Elaheh Baharlouei
Mahsa Shafaei
Yigeng Zhang
Hugo Jair Escalante
Thamar Solorio
46
0
0
12 Jun 2024
Visual Representation Learning with Stochastic Frame Prediction
Visual Representation Learning with Stochastic Frame Prediction
Huiwon Jang
Dongyoung Kim
Junsu Kim
Jinwoo Shin
Pieter Abbeel
Younggyo Seo
42
2
0
11 Jun 2024
Investigating Pre-Training Objectives for Generalization in Vision-Based
  Reinforcement Learning
Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
Donghu Kim
Hojoon Lee
Kyungmin Lee
Dongyoon Hwang
Jaegul Choo
OffRL
35
1
0
10 Jun 2024
CorrMAE: Pre-training Correspondence Transformers with Masked
  Autoencoder
CorrMAE: Pre-training Correspondence Transformers with Masked Autoencoder
Tangfei Liao
Xiaoqin Zhang
Guobao Xiao
Min Li
Tao Wang
Mang Ye
45
1
0
09 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
104
16
0
06 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language
  Space
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
44
1
0
05 Jun 2024
Self-Supervised Skeleton-Based Action Representation Learning: A
  Benchmark and Beyond
Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond
Jiahang Zhang
Lilang Lin
Shuai Yang
Jiaying Liu
SSL
43
0
0
05 Jun 2024
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Trevine Oorloff
Surya Koppisetti
Nicolo Bonettini
Divyaraj Solanki
Ben Colman
Yaser Yacoob
Ali Shahriyari
Gaurav Bharaj
46
21
0
05 Jun 2024
AFF-ttention! Affordances and Attention models for Short-Term Object
  Interaction Anticipation
AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
Lorenzo Mur-Labadia
Ruben Martinez-Cantin
Jose J. Guerrero
G. Farinella
Antonino Furnari
37
4
0
03 Jun 2024
Understanding the Cross-Domain Capabilities of Video-Based Few-Shot
  Action Recognition Models
Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models
Georgia Markham
M. Balamurali
Andrew J. Hill
49
1
0
03 Jun 2024
DroneVis: Versatile Computer Vision Library for Drones
DroneVis: Versatile Computer Vision Library for Drones
Ahmed Heakl
F. Youssef
Victor Parque
Walid Gomaa
AI4TS
46
1
0
01 Jun 2024
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign
  Language Recognition
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition
Weichao Zhao
Hezhen Hu
Wen-gang Zhou
Yunyao Mao
Min Wang
Houqiang Li
SLR
44
8
0
31 May 2024
Multimodal Cross-Domain Few-Shot Learning for Egocentric Action
  Recognition
Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
Masashi Hatano
Ryo Hachiuma
Ryoske Fujii
Hideo Saito
EgoV
42
4
0
30 May 2024
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo
  Benchmark
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Haoxing Chen
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
...
Jun Lan
Huijia Zhu
Jianfu Zhang
Weiqiang Wang
Huaxiong Li
Mamba
83
16
0
30 May 2024
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from
  Egocentric Open Surgery Videos
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
Ryoske Fujii
Masashi Hatano
Hideo Saito
Hiroki Kajita
36
6
0
30 May 2024
Visualizing the loss landscape of Self-supervised Vision Transformer
Visualizing the loss landscape of Self-supervised Vision Transformer
Youngwan Lee
Jeffrey Willette
Jonghee Kim
Sung Ju Hwang
ViT
41
1
0
28 May 2024
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to
  Multimodal Inputs
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Mustafa Shukor
Matthieu Cord
71
5
0
26 May 2024
Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to
  Biological Motion Perception
Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception
Shuangpeng Han
Ziyu Wang
Mengmi Zhang
36
0
0
26 May 2024
Streaming Long Video Understanding with Large Language Models
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Jiaqi Wang
VLM
39
41
0
25 May 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video
  Representation Learning
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Alan L. Yuille
Cihang Xie
AI4TS
VGen
SSL
59
1
0
24 May 2024
SIAVC: Semi-Supervised Framework for Industrial Accident Video
  Classification
SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification
Zuoyong Li
Qinghua Lin
Haoyi Fan
Tiesong Zhao
David Zhang
39
0
0
23 May 2024
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Mohammed Baharoon
Jonathan Klein
D. L. Michels
SSL
VLM
44
0
0
23 May 2024
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
Zhifan Wan
Jie Zhang
Chang-bo Li
Shiguang Shan
69
0
0
21 May 2024
Open-Vocabulary Spatio-Temporal Action Detection
Open-Vocabulary Spatio-Temporal Action Detection
Tao Wu
Shuqiu Ge
Jie Qin
Gangshan Wu
Limin Wang
ObjD
28
5
0
17 May 2024
Infer Induced Sentiment of Comment Response to Video: A New Task,
  Dataset and Baseline
Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
Qi Jia
Baoyu Fan
Cong Xu
Lu Liu
Liang Jin
Guoguang Du
Zhenhua Guo
Yaqian Zhao
Xuanjing Huang
Rengang Li
37
0
0
15 May 2024
A Semantic and Motion-Aware Spatiotemporal Transformer Network for
  Action Detection
A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection
Matthew Korban
Peter Youngs
Scott T. Acton
ViT
29
6
0
13 May 2024
SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
Sushant Gautam
Mehdi Houshmand Sarkhoosh
Jan Held
Cise Midoglu
A. Cioppa
Silvio Giancola
Vajira Thambawita
Michael A. Riegler
P. Halvorsen
Mubarak Shah
36
4
0
12 May 2024
Learning Latent Dynamic Robust Representations for World Models
Learning Latent Dynamic Robust Representations for World Models
Ruixiang Sun
Hongyu Zang
Xin-hui Li
Riashat Islam
39
5
0
10 May 2024
HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous
  Serverless Functions
HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
Jiabin Chen
Fei Xu
Yikun Gu
Li Chen
Fangming Liu
Zhi Zhou
29
6
0
09 May 2024
Hierarchical Space-Time Attention for Micro-Expression Recognition
Hierarchical Space-Time Attention for Micro-Expression Recognition
Haihong Hao
Shuo Wang
Huixia Ben
Yanbin Hao
Yansong Wang
Weiwei Wang
31
1
0
06 May 2024
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial
  Representation Learning
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
Vishal Nedungadi
A. Kariryaa
Stefan Oehmcke
Serge Belongie
Christian Igel
Nico Lang
45
25
0
04 May 2024
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic
  Activity Recognition
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition
Meiqi Cao
Rui Yan
Xiangbo Shu
Guangzhao Dai
Yazhou Yao
Guo-Sen Xie
36
0
0
04 May 2024
Self-Supervised Learning for Interventional Image Analytics: Towards
  Robust Device Trackers
Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers
Saahil Islam
Venkatesh N. Murthy
Dominik Neumann
Badhan Kumar Das
Puneet Sharma
Andreas Maier
Dorin Comaniciu
Florin-Cristian Ghesu
34
1
0
02 May 2024
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion
  Recognition
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition
Peihao Xiang
Chaohao Lin
Kaida Wu
Ou Bai
34
3
0
28 Apr 2024
MER 2024: Semi-Supervised Learning, Noise Robustness, and
  Open-Vocabulary Multimodal Emotion Recognition
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Guoying Zhao
Zhuofan Wen
Siyuan Zhang
...
Bin Liu
Min Zhang
Guoying Zhao
Björn W. Schuller
Jianhua Tao
VLM
41
11
0
26 Apr 2024
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Xiaohong Liu
Xiongkuo Min
Guangtao Zhai
Chunyi Li
Tengchuan Kou
...
Qi Yan
Youran Qu
Xiaohui Zeng
Lele Wang
Renjie Liao
58
29
0
25 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
Jiaxin Zhuang
Linshan Wu
Qiong Wang
V. Vardhanabhuti
Lin Luo
Hao Chen
Hao Chen
57
4
0
24 Apr 2024
On the Content Bias in Fréchet Video Distance
On the Content Bias in Fréchet Video Distance
Jason S. Hoffman
Aniruddha Mahapatra
Gaurav Parmar
Jun-Yan Zhu
Jia-Bin Huang
EGVM
50
15
0
18 Apr 2024
Predicting Long-horizon Futures by Conditioning on Geometry and Time
Predicting Long-horizon Futures by Conditioning on Geometry and Time
Tarasha Khurana
Deva Ramanan
AI4TS
55
0
0
17 Apr 2024
Previous
123...567...131415
Next