Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.12602
Cited By
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
23 March 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training"
50 / 719 papers shown
Title
Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction
Izzeddin Teeti
Rongali Sai Bhargav
Vivek Singh
Andrew Bradley
Biplab Banerjee
Fabio Cuzzolin
19
1
0
08 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
29
16
0
08 Aug 2023
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu
Shihao Wang
Yuan Fang
Wangpeng An
VGen
41
0
0
08 Aug 2023
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
Ya Jing
Xuelin Zhu
Xingbin Liu
Qie Sima
Taozheng Yang
Yunhai Feng
Tao Kong
LM&Ro
45
16
0
07 Aug 2023
Multimodal Adaptation of CLIP for Few-Shot Action Recognition
Jiazheng Xing
Mengmeng Wang
Xiaojun Hou
Guangwen Dai
Jingdong Wang
Yong-Jin Liu
VLM
22
0
0
03 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Lei Li
Gaoang Wang
VLM
MLLM
27
264
0
31 Jul 2023
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features
Adrien Bardes
Jean Ponce
Yann LeCun
MDE
41
25
0
24 Jul 2023
Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Kanchana Ranasinghe
Michael S. Ryoo
SSL
VLM
45
12
0
20 Jul 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
33
137
0
20 Jul 2023
Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Anindya Mondal
Sauradip Nag
J. Prada
Xiatian Zhu
Anjan Dutta
23
9
0
20 Jul 2023
Learning Discriminative Visual-Text Representation for Polyp Re-Identification
Suncheng Xiang
Can Liu
Sijia Du
Xiaobo Li
34
1
0
20 Jul 2023
Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection
Guangzhi Wang
Yangyang Guo
Mohan S. Kankanhalli
28
0
0
19 Jul 2023
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
OCL
LRM
ViT
38
3
0
17 Jul 2023
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
Hongfei Yan
Yang Liu
Yushen Wei
Zerui Li
Guanbin Li
Liang Lin
36
40
0
17 Jul 2023
Masked Autoencoders for Unsupervised Anomaly Detection in Medical Images
Mariana-Iuliana Georgescu
MedIm
33
7
0
14 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
33
249
0
13 Jul 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
36
4
0
10 Jul 2023
SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks
Xingyu Lin
John So
Sashwat Mahalingam
Fangchen Liu
Pieter Abbeel
SSL
30
22
0
07 Jul 2023
It is not Sexually Suggestive, It is Educative. Separating Sex Education from Suggestive Content on TikTok Videos
Enfa George
Mihai Surdeanu
15
1
0
06 Jul 2023
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Liangzhe Yuan
N. B. Gundavarapu
Long Zhao
Hao Zhou
Huayu Chen
...
Florian Schroff
Hartwig Adam
Ming Yang
Ting Liu
Boqing Gong
ELM
40
9
0
06 Jul 2023
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition
Guoying Zhao
Zheng Lian
B. Liu
Jianhua Tao
37
38
0
05 Jul 2023
Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
Xiang Li
Varun Belagali
Jinghuan Shang
Michael S. Ryoo
43
28
0
04 Jul 2023
Human-to-Human Interaction Detection
Zhenhua Wang
Kaining Ying
Jiajun Meng
J. Ning
30
2
0
02 Jul 2023
SpotEM: Efficient Video Search for Episodic Memory
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
VLM
36
9
0
28 Jun 2023
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023
Zhijian Hou
Lei Ji
Difei Gao
Wanjun Zhong
Kun Yan
Chong Li
W. Chan
Chong-Wah Ngo
Nan Duan
Mike Zheng Shou
30
15
0
27 Jun 2023
MAE-GEBD:Winning the CVPR'2023 LOVEU-GEBD Challenge
Yuanxi Sun
Ruifei He
Youzeng Li
Zuwei Huang
Feng Hu
Xu Cheng
Jie Tang
25
1
0
27 Jun 2023
Variance-Covariance Regularization Improves Representation Learning
Jiachen Zhu
Katrina Evtimova
Yubei Chen
Ravid Shwartz-Ziv
Yann LeCun
SSL
28
7
0
23 Jun 2023
FuXi: A cascade machine learning forecasting system for 15-day global weather forecast
Lei Chen
Xiaohui Zhong
Feng-jun Zhang
Yuan Cheng
Yinghui Xu
Yuan Qi
Hao Li
AI4Cl
28
206
0
22 Jun 2023
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
50
14
0
20 Jun 2023
Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023
Jiayi Shao
Xiaohan Wang
Ruijie Quan
Yezhou Yang
EgoV
27
8
0
15 Jun 2023
A Large-Scale Analysis on Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Yogesh S Rawat
SSL
31
3
0
09 Jun 2023
FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow
Zhaoyang Huang
Xiaoyu Shi
Chao Zhang
Qiang Wang
Yijin Li
Hongwei Qin
Jifeng Dai
Xiaogang Wang
Hongsheng Li
33
4
0
08 Jun 2023
Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Shreyank N. Gowda
Anurag Arnab
Jonathan Huang
ViT
31
4
0
07 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
51
22
0
06 Jun 2023
VR.net: A Real-world Dataset for Virtual Reality Motion Sickness Research
Elliott Wen
Chitralekha Gupta
P. Sasikumar
Mark Billinghurst
James P Wilmott
Emily Skow
Arindam Dey
Suranga Nanayakkara
29
11
0
06 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
46
1
0
04 Jun 2023
VideoComposer: Compositional Video Synthesis with Motion Controllability
Xiang Wang
Hangjie Yuan
Shiwei Zhang
Dayou Chen
Jiuniu Wang
Yingya Zhang
Yujun Shen
Deli Zhao
Jingren Zhou
VGen
DiffM
33
319
0
03 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
45
0
0
02 Jun 2023
Unifying (Machine) Vision via Counterfactual World Modeling
Daniel M. Bear
Kevin T. Feigelis
Honglin Chen
Wanhee Lee
R. Venkatesh
Klemen Kotar
Alex Durango
Daniel L. K. Yamins
VGen
25
13
0
02 Jun 2023
HomE: Homography-Equivariant Video Representation Learning
Anirudh Sriram
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
L. Fei-Fei
Ehsan Adeli
SSL
AI4TS
33
2
0
02 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
45
161
0
01 Jun 2023
On Masked Pre-training and the Marginal Likelihood
Pablo Moreno-Muñoz
Pol G. Recasens
Søren Hauberg
SSL
32
5
0
01 Jun 2023
VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning Challenges
Robert-Jan Bruintjes
A. Lengyel
Marcos Baptista-Rios
O. Kayhan
Davide Zambrano
Nergis Tomen
Jan van Gemert
30
9
0
31 May 2023
Benchmarking Diverse-Modal Entity Linking with Generative Models
Sijia Wang
Alexander Hanbo Li
He Zhu
Shenmin Zhang
Chung-Wei Hang
...
William Wang
Zhiguo Wang
Vittorio Castelli
Bing Xiang
Patrick Ng
VLM
43
8
0
27 May 2023
Action Sensitivity Learning for Temporal Action Localization
Jiayi Shao
Xiaohan Wang
Ruijie Quan
Junjun Zheng
Jiang Yang
Yezhou Yang
33
22
0
25 May 2023
Siamese Masked Autoencoders
Agrim Gupta
Jiajun Wu
Jia Deng
Li Fei-Fei
46
49
0
23 May 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
24
9
0
23 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
103
77
0
22 May 2023
Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning
Xiaoxiao Sheng
Zhiqiang Shen
Gang Xiao
3DPC
SSL
28
6
0
22 May 2023
Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition
Nana Li
M. Bennis
Alexandros Iosifidis
Qi Zhang
16
3
0
22 May 2023
Previous
1
2
3
...
10
11
12
13
14
15
Next