Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16727
Cited By
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
29 March 2023
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking"
50 / 225 papers shown
Title
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model
Till Grutschus
Ola Karrar
Emir Esenov
Ekta Vats
18
0
0
29 Jan 2024
MV2MAE: Multi-View Video Masked Autoencoders
Ketul Shah
Robert Crandall
Jie Xu
Peng Zhou
Marian George
Mayank Bansal
Rama Chellappa
35
4
0
29 Jan 2024
Multi-modality action recognition based on dual feature shift in vehicle cabin monitoring
Dan Lin
Philip Hann Yung Lee
Yiming Li
Ruoyu Wang
Kim-Hui Yap
Bingbing Li
You Shing Ngim
38
4
0
26 Jan 2024
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
34
14
0
25 Jan 2024
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
Jiaming Zhou
Junwei Liang
Kun-Yu Lin
Jinrui Yang
Wei-Shi Zheng
VLM
21
8
0
22 Jan 2024
Understanding Video Transformers via Universal Concept Discovery
M. Kowal
Achal Dave
Rares Ambrus
Adrien Gaidon
Konstantinos G. Derpanis
P. Tokmakov
ViT
37
8
0
19 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
43
0
0
10 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
39
14
0
31 Dec 2023
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
49
4
0
29 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
43
5
0
21 Dec 2023
Bootstrap Masked Visual Modeling via Hard Patches Mining
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tiancai Wang
Xiangyu Zhang
Zhaoxiang Zhang
42
5
0
21 Dec 2023
M-BEV: Masked BEV Perception for Robust Autonomous Driving
Siran Chen
Yue Ma
Yu Qiao
Yali Wang
33
8
0
19 Dec 2023
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar
Yongqin Xian
A. Tonioni
Andrew Zisserman
Federico Tombari
38
12
0
19 Dec 2023
Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception
Tianlin Li
Wentao Wu
Chenglong Li
Zhicheng Zhao
Zhe Chen
Yukai Shi
Jin Tang
46
4
0
15 Dec 2023
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
31
2
0
11 Dec 2023
Counterfactual World Modeling for Physical Dynamics Understanding
Rahul Venkatesh
Honglin Chen
Kevin T. Feigelis
Daniel M. Bear
Khaled Jedoui
...
Wanhee Lee
Sherry Liu
Kevin A. Smith
Judith E. Fan
Daniel L. K. Yamins
VGen
40
1
0
11 Dec 2023
A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation
Yunheng Li
Zhongyu Li
Shanghua Gao
Qilong Wang
Qibin Hou
Ming-Ming Cheng
56
3
0
10 Dec 2023
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
37
4
0
05 Dec 2023
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
Min Yang
Huan Gao
Ping Guo
Limin Wang
ViT
33
5
0
04 Dec 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Guohao Li
33
25
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
58
399
0
28 Nov 2023
ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization
Elahe Vahdani
Yingli Tian
21
0
0
27 Nov 2023
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
Zhixi Cai
Shreya Ghosh
Aman Pankaj Adatia
Munawar Hayat
Abhinav Dhall
Kalin Stefanov
21
27
0
26 Nov 2023
Robot Learning in the Era of Foundation Models: A Survey
Xuan Xiao
Jiahang Liu
Zhipeng Wang
Yanmin Zhou
Yong Qi
Qian Cheng
Bin He
Shuo Jiang
AI4CE
LM&Ro
26
27
0
24 Nov 2023
Multi-entity Video Transformers for Fine-Grained Video Representation Learning
Matthew Walmer
Rose Kanjirathinkal
Kai Sheng Tai
Keyur Muzumdar
Taipeng Tian
Abhinav Shrivastava
ViT
28
0
0
17 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
29
64
0
07 Nov 2023
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao
Bingkun Huang
Sen Xing
Gangshan Wu
Yu Qiao
Limin Wang
42
5
0
06 Nov 2023
Pre-training with Random Orthogonal Projection Image Modeling
Maryam Haghighat
Peyman Moghadam
Shaheer Mohamed
Piotr Koniusz
VLM
28
8
0
28 Oct 2023
Bridging The Gaps Between Token Pruning and Full Pre-training via Masked Fine-tuning
Fengyuan Shi
Limin Wang
ViT
38
0
0
26 Oct 2023
S3Aug: Segmentation, Sampling, and Shift for Action Recognition
Taiki Sugiura
Toru Tamaki
AI4TS
27
2
0
23 Oct 2023
POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization
Elahe Vahdani
Yingli Tian
30
1
0
20 Oct 2023
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Yaofang Liu
Xiaodong Cun
Xuebo Liu
Xintao Wang
Yong Zhang
Haoxin Chen
Yang Liu
Tieyong Zeng
Raymond H. F. Chan
Ying Shan
VGen
EGVM
21
127
0
17 Oct 2023
Flow Dynamics Correction for Action Recognition
Lei Wang
Piotr Koniusz
21
10
0
16 Oct 2023
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
Sudeep Dasari
Mohan Kumar Srirama
Unnat Jain
Abhinav Gupta
SSL
34
34
0
13 Oct 2023
Boundary Discretization and Reliable Classification Network for Temporal Action Detection
Zhenying Fang
Jun Yu
Richang Hong
26
0
0
10 Oct 2023
Diffusion Models as Masked Audio-Video Learners
Elvis Nunez
Yanzi Jin
Mohammad Rastegari
Sachin Mehta
Maxwell Horton
25
2
0
05 Oct 2023
Beyond the Benchmark: Detecting Diverse Anomalies in Videos
Yoav Arad
Michael Werman
16
2
0
03 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
35
8
0
02 Oct 2023
SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time Echocardiograms with Self- and Weakly-Supervised Learning
F. Maani
Asim Ukaye
Nada Saadi
Numan Saeed
Mohammad Yaqub
129
1
0
30 Sep 2023
M
3
^{3}
3
3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
16
1
0
26 Sep 2023
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios
Francesco Ragusa
Rosario Leonardi
Michele Mazzamuto
Claudia Bonanno
Rosario Scavo
Antonino Furnari
G. Farinella
30
7
0
26 Sep 2023
SoccerNet 2023 Challenges Results
A. Cioppa
Silvio Giancola
Vladimir Somers
Floriane Magera
Xinxing Zhou
...
Yujie Zhong
Zheng Ruan
Zhiheng Li
Zhijian Huang
Ziyu Meng
44
15
0
12 Sep 2023
Temporal Action Localization with Enhanced Instant Discriminability
Ding Shi
Qiong Cao
Yujie Zhong
Shan An
Jian Cheng
Haogang Zhu
Dacheng Tao
39
9
0
11 Sep 2023
Self-Supervised Video Transformers for Isolated Sign Language Recognition
Marcelo Sandoval-Castaneda
Yanhong Li
D. Brentari
Karen Livescu
Gregory Shakhnarovich
SLR
23
2
0
02 Sep 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
39
30
0
21 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
35
9
0
10 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Jenq-Neng Hwang
Gaoang Wang
VLM
MLLM
22
262
0
31 Jul 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
30
137
0
20 Jul 2023
Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Anindya Mondal
Sauradip Nag
J. Prada
Xiatian Zhu
Anjan Dutta
23
9
0
20 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
33
244
0
13 Jul 2023
Previous
1
2
3
4
5
Next