Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 719 papers shown
Title
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity
Zijiao Chen
Jiaxin Qing
J. Zhou
DiffM
VGen
37
55
0
19 May 2023
SurgMAE: Masked Autoencoders for Long Surgical Video Analysis
Muhammad Abdullah Jamal
Omid Mohareri
25
5
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
48
116
0
18 May 2023
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot
Aanisha Bhattacharya
Yaman Kumar Singla
Balaji Krishnamurthy
R. Shah
Changyou Chen
VGen
32
11
0
16 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
34
2
0
13 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
29
46
0
10 May 2023
VideoChat: Chat-Centric Video Understanding
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
69
534
0
10 May 2023
Visual Tuning
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
62
38
0
10 May 2023
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
56
855
0
09 May 2023
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Zhaoyang Liu
Yinan He
Wenhai Wang
Weiyun Wang
Yi Wang
...
Yali Wang
Limin Wang
Ping Luo
Jifeng Dai
Yu Qiao
LRM
MLLM
41
79
0
09 May 2023
PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
Zhiqiang Shen
Xiaoxiao Sheng
Longguang Wang
Y. Guo
Qiong Liu
Xiaoping Zhou
3DPC
SSL
35
14
0
06 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
81
6
0
05 May 2023
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
50
275
0
24 Apr 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
42
13
0
24 Apr 2023
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner
Benedikt Alkin
Andreas Fürst
Elisabeth Rumetshofer
Lukas Miklautz
Sepp Hochreiter
31
18
0
20 Apr 2023
Transformer-Based Visual Segmentation: A Survey
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViT
MedIm
42
132
0
19 Apr 2023
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
Zheng Lian
Haiyang Sun
Guoying Zhao
Kang Chen
Mingyu Xu
...
Meng Wang
Min Zhang
Guoying Zhao
Björn W. Schuller
Jianhua Tao
45
48
0
18 Apr 2023
Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
38
14
0
17 Apr 2023
The 7th AI City Challenge
M. Naphade
Shuo Wang
D. Anastasiu
Zhenghang Tang
Ming-Ching Chang
...
Alice Li
Shangru Li
Krishna Kunadharaju
Shenxin Jiang
Ramalingam Chellappa
49
53
0
15 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
143
3,070
0
14 Apr 2023
Hard Patches Mining for Masked Image Modeling
Haochen Wang
Kaiyou Song
Junsong Fan
Yuxi Wang
Jin Xie
Zhaoxiang Zhang
37
59
0
12 Apr 2023
Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection
Wei-Jhe Huang
Jheng-Hsien Yeh
Min-Hung Chen
Gueter Josmy Faure
S. Lai
44
3
0
10 Apr 2023
On Robustness in Multimodal Learning
Brandon McKinzie
Joseph Cheng
Vaishaal Shankar
Yinfei Yang
Jonathon Shlens
Alexander Toshev
40
2
0
10 Apr 2023
StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation
Francesco Ragusa
G. Farinella
Antonino Furnari
21
18
0
08 Apr 2023
Self-Supervised Video Similarity Learning
Giorgos Kordopatis-Zilos
Giorgos Tolias
Christos Tzelepis
I. Kompatsiaris
Ioannis Patras
Symeon Papadopoulos
SSL
37
8
0
06 Apr 2023
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Jathushan Rajasegaran
Georgios Pavlakos
Angjoo Kanazawa
Christoph Feichtenhofer
Jitendra Malik
39
30
0
03 Apr 2023
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yong-Lu Li
Xiaoqian Wu
Xinpeng Liu
Zehao Wang
Yiming Dou
...
Junyi Zhang
Yixing Li
Jingru Tan
Xudong Lu
Cewu Lu
27
17
0
02 Apr 2023
Complementary Random Masking for RGB-Thermal Semantic Segmentation
Ukcheol Shin
Kyunghyun Lee
In So Kweon
Jean Oh
32
20
0
30 Mar 2023
Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning
Chongjian Ge
Jiangliu Wang
Zhan Tong
Shoufa Chen
Yibing Song
Ping Luo
SSL
22
27
0
30 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
73
329
0
29 Mar 2023
Language-Guided Audio-Visual Source Separation via Trimodal Consistency
Reuben Tan
Arijit Ray
Andrea Burns
Bryan A. Plummer
Justin Salamon
Oriol Nieto
Bryan C. Russell
Kate Saenko
23
21
0
28 Mar 2023
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
I. Dave
Mamshad Nayeem Rizve
Chong Chen
M. Shah
TTA
46
16
0
28 Mar 2023
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
25
3
0
28 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
57
156
0
28 Mar 2023
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval
Qingguo Chen
Shilun Cai
C. Cai
Zefang Yu
Xiaobo Li
Suncheng Xiang
36
7
0
28 Mar 2023
Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder
Tao Sun
Lu Pang
Chao Chen
Haibin Ling
AAML
43
9
0
27 Mar 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
41
95
0
25 Mar 2023
3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
Lei Wang
Piotr Koniusz
ViT
28
45
0
25 Mar 2023
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck
Jongheon Jeong
Sihyun Yu
Hankook Lee
Jinwoo Shin
AAML
44
0
0
24 Mar 2023
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
Runsen Xu
Tai Wang
Wenwei Zhang
Runjian Chen
Jinkun Cao
Jiangmiao Pang
Dahua Lin
3DPC
39
30
0
23 Mar 2023
A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition
Andong Deng
Taojiannan Yang
Chong Chen
AI4TS
27
13
0
23 Mar 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
126
63
0
23 Mar 2023
POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
Ce Zheng
Xianpeng Liu
Guo-Jun Qi
Chong Chen
3DH
113
33
0
23 Mar 2023
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
J. Hernandez
Ruben Villegas
Vicente Ordonez
SSL
33
4
0
21 Mar 2023
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Fida Mohammad Thoker
Hazel Doughty
Cees G. M. Snoek
ViT
48
9
0
20 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
ViT
21
37
0
17 Mar 2023
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models
Xufeng Zhao
Mengdi Li
C. Weber
Muhammad Burhan Hafez
S. Wermter
LLMAG
LM&Ro
LRM
112
47
0
14 Mar 2023
Traj-MAE: Masked Autoencoders for Trajectory Prediction
Hao Chen
Jiaze Wang
Kun Shao
Furui Liu
Jianye Hao
Chenyong Guan
Guangyong Chen
Pheng-Ann Heng
72
38
0
12 Mar 2023
Improving Masked Autoencoders by Learning Where to Mask
Haijia Chen
Wendong Zhang
Yunbo Wang
Xiaokang Yang
SSL
20
20
0
12 Mar 2023
Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection
Juan Hu
Xin Liao
Difei Gao
Satoshi Tsutsui
Qian Wang
Zheng Qin
Mike Zheng Shou
AAML
26
4
0
03 Mar 2023
Previous
1
2
3
...
11
12
13
14
15
Next