ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.06096
  4. Cited By
Masked Motion Encoding for Self-Supervised Video Representation Learning
v1v2 (latest)

Masked Motion Encoding for Self-Supervised Video Representation Learning

12 October 2022
Xinyu Sun
Peihao Chen
Liang-Chieh Chen
Chan Li
Thomas H. Li
Mingkui Tan
Chuang Gan
ArXiv (abs)PDFHTMLGithub (52★)

Papers citing "Masked Motion Encoding for Self-Supervised Video Representation Learning"

40 / 40 papers shown
Title
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
135
8
0
13 Aug 2024
MaskViT: Masked Visual Pre-Training for Video Prediction
MaskViT: Masked Visual Pre-Training for Video Prediction
Agrim Gupta
Stephen Tian
Yunzhi Zhang
Jiajun Wu
Roberto Martín-Martín
Li Fei-Fei
167
119
0
23 Jun 2022
ComPhy: Compositional Physical Reasoning of Objects and Events from
  Videos
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
Zhenfang Chen
Kexin Yi
Yunzhu Li
Mingyu Ding
Antonio Torralba
J. Tenenbaum
Chuang Gan
CoGeOCL
78
52
0
02 May 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
78
221
0
12 Jan 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
145
668
0
16 Dec 2021
Self-supervised Video Transformer
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Michael S. Ryoo
ViT
113
88
0
02 Dec 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
465
7,757
0
11 Nov 2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive
  Learning
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
Hao Tan
Jie Lei
Thomas Wolf
Joey Tianyi Zhou
91
66
0
21 Jun 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
274
2,826
0
15 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
86
279
0
09 Jun 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSLAI4TS
99
262
0
29 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
132
1,259
0
22 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
149
1,176
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSLAI4TS
95
128
0
30 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
222
2,150
0
29 Mar 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
415
4,953
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
387
2,053
0
09 Feb 2021
Self-supervised Co-training for Video Representation Learning
Self-supervised Co-training for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
242
319
0
19 Oct 2020
Removing the Background by Adding the Background: Towards Background
  Robust Self-supervised Video Representation Learning
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
Jinpeng Wang
Yuting Gao
Ke Li
Yiqi Lin
A. J. Ma
Hao Cheng
Pai Peng
Feiyue Huang
Rongrong Ji
Xing Sun
SSL
95
97
0
12 Sep 2020
Location-aware Graph Convolutional Networks for Video Question Answering
Location-aware Graph Convolutional Networks for Video Question Answering
Deng Huang
Peihao Chen
Runhao Zeng
Qing Du
Mingkui Tan
Chuang Gan
GNNBDL
89
175
0
07 Aug 2020
SpeedNet: Learning the Speediness in Videos
SpeedNet: Learning the Speediness in Videos
Sagie Benaim
Ariel Ephrat
Oran Lang
Inbar Mosseri
William T. Freeman
Michael Rubinstein
Michal Irani
Tali Dekel
69
260
0
13 Apr 2020
Dense Regression Network for Video Grounding
Dense Regression Network for Video Grounding
Runhao Zeng
Haoming Xu
Wenbing Huang
Peihao Chen
Mingkui Tan
Chuang Gan
79
283
0
07 Apr 2020
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
  Super-Resolution
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution
Xiaoyu Xiang
Yapeng Tian
Yulun Zhang
Y. Fu
J. Allebach
Chenliang Xu
SupR
48
171
0
26 Feb 2020
Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of
  Sentence in Video
Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video
Zhenfang Chen
Lin Ma
Wenhan Luo
Peng Tang
Kwan-Yee K. Wong
44
68
0
25 Jan 2020
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Dezhao Luo
Chang-rui Liu
Yu Zhou
Dongbao Yang
Can Ma
QiXiang Ye
Weiping Wang
SSL
61
161
0
02 Jan 2020
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
95
431
0
28 Nov 2019
RandAugment: Practical automated data augmentation with a reduced search
  space
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
234
3,490
0
30 Sep 2019
CutMix: Regularization Strategy to Train Strong Classifiers with
  Localizable Features
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
619
4,780
0
13 May 2019
Learning Correspondence from the Cycle-Consistency of Time
Learning Correspondence from the Cycle-Consistency of Time
Xinyu Wang
Allan Jabri
Alexei A. Efros
SSL
83
490
0
18 Mar 2019
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
166
3,274
0
10 Dec 2018
Self-Supervised Spatiotemporal Feature Learning via Video Rotation
  Prediction
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction
Longlong Jing
Xiaodong Yang
Jingen Liu
Yingli Tian
68
156
0
28 Nov 2018
mixup: Beyond Empirical Risk Minimization
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
280
9,764
0
25 Oct 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
235
8,019
0
22 May 2017
Temporal Segment Networks for Action Recognition in Videos
Temporal Segment Networks for Action Recognition in Videos
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
114
810
0
08 May 2017
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
333
8,130
0
13 Aug 2016
Deep Networks with Stochastic Depth
Deep Networks with Stochastic Depth
Gao Huang
Yu Sun
Zhuang Liu
Daniel Sedra
Kilian Q. Weinberger
215
2,357
0
30 Mar 2016
Rethinking the Inception Architecture for Computer Vision
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DVBDL
883
27,373
0
02 Dec 2015
Spatio-temporal video autoencoder with differentiable memory
Spatio-temporal video autoencoder with differentiable memory
Viorica Patraucean
Ankur Handa
R. Cipolla
87
307
0
19 Nov 2015
Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs
Nitish Srivastava
Elman Mansimov
Ruslan Salakhutdinov
SSL
135
2,591
0
16 Feb 2015
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIPVGen
155
6,162
0
03 Dec 2012
1