ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.12602
  4. Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
    ViT
ArXivPDFHTML

Papers citing "VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"

50 / 719 papers shown
Title
NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods
  and Results
NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Xin Li
Kun Yuan
Yajing Pei
Yiting Lu
Ming Sun
...
Kele Xu
Qisheng Xu
Tao Sun
Zhi-Guo Ding
Yuhan Hu
54
23
0
17 Apr 2024
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar
Arya Bakhtiar
Danny Tran
Antonio Loquercio
Jathushan Rajasegaran
Yann LeCun
Amir Globerson
Trevor Darrell
EgoV
41
4
0
15 Apr 2024
STMixer: A One-Stage Sparse Action Detector
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
29
0
0
15 Apr 2024
The 8th AI City Challenge
The 8th AI City Challenge
Shuo Wang
D. Anastasiu
Zhenghang Tang
Ming-Ching Chang
Yue Yao
...
Xunlei Wu
S. Pusegaonkar
Yizhou Wang
Sujit Biswas
Rama Chellappa
53
32
0
15 Apr 2024
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision
  Transformers
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
Diana-Nicoleta Grigore
Mariana-Iuliana Georgescu
J. A. Justo
T. Johansen
Andreea-Iuliana Ionescu
Radu Tudor Ionescu
36
0
0
14 Apr 2024
An Animation-based Augmentation Approach for Action Recognition from
  Discontinuous Video
An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video
Xingyu Song
Zhan Li
Shi Chen
Xin-Qiang Cai
K. Demachi
33
2
0
10 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
  Understanding
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
83
89
0
08 Apr 2024
Social-MAE: Social Masked Autoencoder for Multi-person Motion
  Representation Learning
Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning
Mahsa Ehsanpour
Ian Reid
Hamid Rezatofighi
ViT
40
0
0
08 Apr 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
46
9
0
08 Apr 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng
Yujie Zhong
Chengjian Feng
Lin Ma
63
7
0
07 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports
  Videos
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
54
3
0
06 Apr 2024
SalFoM: Dynamic Saliency Prediction with Video Foundation Models
SalFoM: Dynamic Saliency Prediction with Video Foundation Models
Morteza Moradi
Mohammad Moradi
Francesco Rundo
C. Spampinato
Ali Borji
S. Palazzo
44
1
0
03 Apr 2024
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation
  Learning for Neural Radiance Fields
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad
Sergey Zakahrov
Vitor Campagnolo Guizilini
Adrien Gaidon
Z. Kira
Rares Andrei Ambrus
ViT
49
12
0
01 Apr 2024
360+x: A Panoptic Multi-modal Scene Understanding Dataset
360+x: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen
Yuqi Hou
Chenyuan Qu
Irene Testini
Xiaohan Hong
Jianbo Jiao
31
7
0
01 Apr 2024
HypeBoy: Generative Self-Supervised Representation Learning on
  Hypergraphs
HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs
Sunwoo Kim
Shinhwan Kang
Fanchen Bu
Soo Yong Lee
Jaemin Yoo
Kijung Shin
SSL
37
11
0
31 Mar 2024
ST-LLM: Large Language Models Are Effective Temporal Learners
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu
Chen Li
Haoran Tang
Yixiao Ge
Ying Shan
Ge Li
51
70
0
30 Mar 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
40
2
0
28 Mar 2024
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël
Renaud Vandeghen
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
ViT
48
6
0
26 Mar 2024
Elysium: Exploring Object-level Perception in Videos via MLLM
Elysium: Exploring Object-level Perception in Videos via MLLM
Hang Wang
Yanjie Wang
Yongjie Ye
Yuxiang Nie
Can Huang
MLLM
42
19
0
25 Mar 2024
Adversarially Masked Video Consistency for Unsupervised Domain
  Adaptation
Adversarially Masked Video Consistency for Unsupervised Domain Adaptation
Xiaoyu Zhu
Junwei Liang
Po-Yao Huang
Alex Hauptmann
32
1
0
24 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided
  Training
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
42
1
0
24 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for
  Faster Inference
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
38
2
0
24 Mar 2024
Edit3K: Universal Representation Learning for Video Editing Components
Edit3K: Universal Representation Learning for Video Editing Components
Xin Gu
Libo Zhang
Fan Chen
Longyin Wen
Yufei Wang
Tiejian Luo
Sijie Zhu
43
4
0
24 Mar 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang
Guo Chen
Jilan Xu
Mingfang Zhang
Lijin Yang
...
Hongjie Zhang
Lu Dong
Yali Wang
Limin Wang
Yu Qiao
EgoV
71
38
0
24 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video
  Understanding
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
42
49
0
22 Mar 2024
VidLA: Video-Language Alignment at Scale
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLM
AI4TS
58
4
0
21 Mar 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
56
7
0
21 Mar 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
55
12
0
20 Mar 2024
N-Modal Contrastive Losses with Applications to Social Media Data in
  Trimodal Space
N-Modal Contrastive Losses with Applications to Social Media Data in Trimodal Space
William Theisen
Walter J. Scheirer
34
1
0
18 Mar 2024
A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity
  Recognition
A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition
Abhi Kamboj
Minh Do
29
1
0
17 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
73
0
14 Mar 2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Jongsuk Kim
Hyeongkeun Lee
Kyeongha Rho
Junmo Kim
Joon Son Chung
34
4
0
14 Mar 2024
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving
  Representation Learning
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
49
2
0
13 Mar 2024
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with
  Focused Masked Autoencoders
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Soumen Basu
Mayuna Gupta
Chetan Madan
Pankaj Gupta
Chetan Arora
36
4
0
13 Mar 2024
CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression
CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression
Xinjie Zhang
Shenyuan Gao
Zhening Liu
Jiawei Shao
Xingtong Ge
Dailan He
Tongda Xu
Yan Wang
Jun Zhang
48
1
0
13 Mar 2024
Spatiotemporal Representation Learning for Short and Long Medical Image
  Time Series
Spatiotemporal Representation Learning for Short and Long Medical Image Time Series
Chengzhi Shen
M. Menten
Hrvoje Bogunović
U. Schmidt-Erfurth
H. Scholl
S. Sivaprasad
A. Lotery
Daniel Rueckert
Paul Hager
Robbie Holland
40
2
0
12 Mar 2024
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained
  Models for Spatiotemporal Modeling
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling
W. G. C. Bandara
Vishal M. Patel
VPVLM
VLM
36
1
0
11 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
42
184
0
11 Mar 2024
Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models
Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models
Philip Harris
Michael Kagan
J. Krupa
B. Maier
Nathaniel Woodward
46
13
0
11 Mar 2024
Using Fiber Optic Bundles to Miniaturize Vision-Based Tactile Sensors
Using Fiber Optic Bundles to Miniaturize Vision-Based Tactile Sensors
Julia Di
Zdravko Dugonjic
Will Fu
Tingfan Wu
Romeo Mercado
...
Richard E. Fan
G. Sonn
M. Cutkosky
Mike Lambeta
Roberto Calandra
40
9
0
08 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
53
10
0
08 Mar 2024
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
Yifang Xu
Yunzhuo Sun
Zien Xie
Benxiang Zhai
Sidan Du
56
6
0
04 Mar 2024
Data-efficient Event Camera Pre-training via Disentangled Masked
  Modeling
Data-efficient Event Camera Pre-training via Disentangled Masked Modeling
Zhenpeng Huang
Chao Li
Hao Chen
Yongjian Deng
Yifeng Geng
Limin Wang
55
2
0
01 Mar 2024
VideoMAC: Video Masked Autoencoders Meet ConvNets
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei
Tao Chen
XiRuo Jiang
Huafeng Liu
Zeren Sun
Yazhou Yao
VGen
52
9
0
29 Feb 2024
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of
  Foundation Models for Open-World Video Recognition
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
34
3
0
29 Feb 2024
SFTformer: A Spatial-Frequency-Temporal Correlation-Decoupling
  Transformer for Radar Echo Extrapolation
SFTformer: A Spatial-Frequency-Temporal Correlation-Decoupling Transformer for Radar Echo Extrapolation
Liangyu Xu
Wanxuan Lu
Hongfeng Yu
Fanglong Yao
Xian Sun
Kun Fu
50
5
0
28 Feb 2024
MedContext: Learning Contextual Cues for Efficient Volumetric Medical
  Segmentation
MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation
Hanan Gani
Muzammal Naseer
Fahad Khan
Salman Khan
28
0
0
27 Feb 2024
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form
  Video-Text Understanding
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
Yuxuan Wang
Yueqian Wang
Pengfei Wu
Jianxin Liang
Dongyan Zhao
Zilong Zheng
VLM
36
9
0
25 Feb 2024
ViSTec: Video Modeling for Sports Technique Recognition and Tactical
  Analysis
ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis
Yuchen He
Zeqing Yuan
Yihong Wu
Liqi Cheng
Dazhen Deng
Yingcai Wu
43
4
0
25 Feb 2024
Data-Efficient Operator Learning via Unsupervised Pretraining and
  In-Context Learning
Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning
Wuyang Chen
Jialin Song
Pu Ren
Shashank Subramanian
Dmitriy Morozov
Michael W. Mahoney
AI4CE
54
12
0
24 Feb 2024
Previous
123...678...131415
Next