ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04500
  4. Cited By
Masked Video Distillation: Rethinking Masked Feature Modeling for
  Self-supervised Video Representation Learning
v1v2 (latest)

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

8 December 2022
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu-Gang Jiang
    VGen
ArXiv (abs)PDFHTMLGithub (126★)

Papers citing "Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning"

50 / 66 papers shown
Title
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization
Xiaoqi Wang
Yi Wang
Lap-Pui Chau
20
0
0
17 Jun 2025
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
Darryl Ho
Samuel Madden
AI4TS
5
0
0
14 Jun 2025
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Di Wen
Lei Qi
Kunyu Peng
Kailun Yang
Fei Teng
...
Yufan Chen
R. Liu
Yitian Shi
M. Sarfraz
Rainer Stiefelhagen
50
0
0
03 Jun 2025
Large-scale Self-supervised Video Foundation Model for Intelligent Surgery
Large-scale Self-supervised Video Foundation Model for Intelligent Surgery
Shu Yang
F. Zhou
Leon D. Mayer
Fuxiang Huang
Yiliang Chen
...
Zheng Li
Jing Qin
J. Teoh
Lena Maier-Hein
Hao-tao Chen
62
0
0
03 Jun 2025
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
Baoyu Liang
Qile Su
Shoutai Zhu
Yuchen Liang
Chao Tong
VGen
46
1
0
03 Jun 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
96
0
0
13 May 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViTSSL
110
0
0
08 Apr 2025
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Yogesh S Rawat
SSL
473
0
0
08 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
124
0
0
01 Apr 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
247
0
0
30 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
103
0
0
20 Mar 2025
Scaling 4D Representations
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
138
5
0
19 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
130
0
0
18 Dec 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
183
0
0
24 Nov 2024
Extending Video Masked Autoencoders to 128 frames
Extending Video Masked Autoencoders to 128 frames
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming-Hsuan Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
118
1
0
20 Nov 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient
  Object-Aware Pretraining
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
79
0
0
26 Sep 2024
Across-Game Engagement Modelling via Few-Shot Learning
Across-Game Engagement Modelling via Few-Shot Learning
Kosmas Pinitas
Konstantinos Makantasis
Georgios N. Yannakakis
74
1
0
19 Sep 2024
Data Collection-free Masked Video Modeling
Data Collection-free Masked Video Modeling
Yuchi Ishikawa
Masayoshi Kondo
Yoshimitsu Aoki
ViT
80
1
0
10 Sep 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models
GenRec: Unifying Video Generation and Recognition with Diffusion Models
Zejia Weng
Xitong Yang
Zhen Xing
Zuxuan Wu
Yu-Gang Jiang
VGenDiffM
101
7
0
27 Aug 2024
E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video
  Editing Quality Assessment
E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
Shangkun Sun
Xiaoyu Liang
S. Fan
Wenxu Gao
Wei-Nan Gao
DiffM
96
0
0
21 Aug 2024
Computer Vision Model Compression Techniques for Embedded Systems: A
  Survey
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
69
9
0
15 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
166
8
0
13 Aug 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context
  Relation Modeling
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Seok Hwan Lee
Taein Son
Soo Won Seo
Jisong Kim
Jun Won Choi
89
0
0
07 Aug 2024
How Effective are Self-Supervised Models for Contact Identification in
  Videos
How Effective are Self-Supervised Models for Contact Identification in Videos
Omri Herscovici
Limalka Sadith
Liel David
Daniel Harari
Muhammad Haris Khan
75
0
0
01 Aug 2024
SIGMA:Sinkhorn-Guided Masked Video Modeling
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
93
7
0
22 Jul 2024
QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention
  Procedures from First-Person View
QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View
T. Vuong
Doanh C. Bui
Jin Tae Kwak
52
0
0
18 Jul 2024
GameVibe: A Multimodal Affective Game Corpus
GameVibe: A Multimodal Affective Game Corpus
M. Barthet
Maria Kaselimi
Kosmas Pinitas
Konstantinos Makantasis
Antonios Liapis
Georgios N. Yannakakis
93
3
0
17 Jun 2024
Self-Supervised Representation Learning with Spatial-Temporal
  Consistency for Sign Language Recognition
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition
Weichao Zhao
Wengang Zhou
Hezhen Hu
Min Wang
Houqiang Li
SLR
95
3
0
15 Jun 2024
Thoracic Surgery Video Analysis for Surgical Phase Recognition
Thoracic Surgery Video Analysis for Surgical Phase Recognition
S. Mateen
Niharika Malvia
Syed Abdul Khader
Danny Wang
Deepti Srinivasan
Chi-Fu Jeffrey Yang
Lana Schumacher
Sandeep Manjanna
48
0
0
13 Jun 2024
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang
Shoubin Yu
Elias Stengel-Eskin
Jaehong Yoon
Feng Cheng
Gedas Bertasius
Mohit Bansal
146
70
0
29 May 2024
The SkatingVerse Workshop & Challenge: Methods and Results
The SkatingVerse Workshop & Challenge: Methods and Results
Jian Zhao
Lei Jin
Jianshu Li
Zheng Zhu
Yinglei Teng
...
Shiníchi Satoh
Yandong Guo
Cewu Lu
Junliang Xing
Jane Shengmei Shen
AI4TS
53
0
0
27 May 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video
  Representation Learning
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Alan Yuille
Cihang Xie
AI4TSVGenSSL
78
2
0
24 May 2024
A Survey on Backbones for Deep Video Action Recognition
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
60
1
0
09 May 2024
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Xiaohong Liu
Xiongkuo Min
Guangtao Zhai
Chunyi Li
Tengchuan Kou
...
Qi Yan
Youran Qu
Xiaohui Zeng
Lele Wang
Renjie Liao
104
31
0
25 Apr 2024
Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text
  Consistency and Domain Distribution Gap
Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap
Bowen Qu
Xiaoyu Liang
Shangkun Sun
Wei-Nan Gao
EGVM
115
8
0
21 Apr 2024
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar
Arya Bakhtiar
Danny Tran
Antonio Loquercio
Jathushan Rajasegaran
Yann LeCun
Amir Globerson
Trevor Darrell
EgoV
87
5
0
15 Apr 2024
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision
  Transformers
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
Diana-Nicoleta Grigore
Mariana-Iuliana Georgescu
J. A. Justo
T. Johansen
Andreea-Iuliana Ionescu
Radu Tudor Ionescu
74
0
0
14 Apr 2024
Enhancing Video Transformers for Action Understanding with VLM-aided
  Training
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
69
2
0
24 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video
  Understanding
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
88
79
0
22 Mar 2024
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video
  Object Segmentation
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
Zixin Zhu
Xuelu Feng
Dongdong Chen
Junsong Yuan
Chunming Qiao
Gang Hua
DiffM
104
8
0
18 Mar 2024
A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity
  Recognition
A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition
Abhi Kamboj
Minh Do
84
4
0
17 Mar 2024
When can we Approximate Wide Contrastive Models with Neural Tangent
  Kernels and Principal Component Analysis?
When can we Approximate Wide Contrastive Models with Neural Tangent Kernels and Principal Component Analysis?
Gautham Govind Anil
Pascal Esser
Debarghya Ghoshdastidar
76
1
0
13 Mar 2024
VideoMAC: Video Masked Autoencoders Meet ConvNets
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei
Tao Chen
XiRuo Jiang
Huafeng Liu
Zeren Sun
Yazhou Yao
VGen
90
10
0
29 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
123
36
0
20 Feb 2024
Revisiting Feature Prediction for Learning Visual Representations from
  Video
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDEVLM
155
87
0
15 Feb 2024
Computer Vision for Primate Behavior Analysis in the Wild
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
Florentin Wörgötter
Alexander S. Ecker
122
6
0
29 Jan 2024
MV2MAE: Multi-View Video Masked Autoencoders
MV2MAE: Multi-View Video Masked Autoencoders
Ketul Shah
Robert Crandall
Jie Xu
Peng Zhou
Marian George
Mayank Bansal
Rama Chellappa
70
5
0
29 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
76
0
0
10 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
103
15
0
31 Dec 2023
ViLA: Efficient Video-Language Alignment for Video Question Answering
ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang
Junbang Liang
Chun-Kai Wang
Kenan Deng
Yu Lou
Ming-Chyuan Lin
Shan Yang
96
14
0
13 Dec 2023
12
Next