ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09886
  4. Cited By
SimMIM: A Simple Framework for Masked Image Modeling

SimMIM: A Simple Framework for Masked Image Modeling

18 November 2021
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Qi Dai
Han Hu
ArXivPDFHTML

Papers citing "SimMIM: A Simple Framework for Masked Image Modeling"

50 / 849 papers shown
Title
Improving Visual Representation Learning through Perceptual
  Understanding
Improving Visual Representation Learning through Perceptual Understanding
Samyakh Tukra
Frederick Hoffman
Ken Chatfield
33
5
0
30 Dec 2022
Local Learning on Transformers via Feature Reconstruction
Local Learning on Transformers via Feature Reconstruction
P. Pathak
Jingwei Zhang
Dimitris Samaras
ViT
24
5
0
29 Dec 2022
Reversible Column Networks
Reversible Column Networks
Yuxuan Cai
Yi Zhou
Qi Han
Jianjian Sun
Xiangwen Kong
Jun Yu Li
Xiangyu Zhang
VLM
31
53
0
22 Dec 2022
MaskingDepth: Masked Consistency Regularization for Semi-supervised
  Monocular Depth Estimation
MaskingDepth: Masked Consistency Regularization for Semi-supervised Monocular Depth Estimation
Jongbeom Baek
Gyeongnyeon Kim
Seonghoon Park
Honggyu An
Matteo Poggi
Seung Wook Kim
MDE
37
0
0
21 Dec 2022
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
Simone Klenk
David Bonello
Lukas Koestler
Nikita Araslanov
Daniel Cremers
34
23
0
20 Dec 2022
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with
  Informative-Preserved Reconstruction and Self-Distilled Consistency
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency
Mingye Xu
Mutian Xu
Tong He
Wanli Ouyang
Yali Wang
Xiaoguang Han
Yu Qiao
34
10
0
20 Dec 2022
Attentive Mask CLIP
Attentive Mask CLIP
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
42
27
0
16 Dec 2022
Efficient Self-supervised Learning with Contextualized Target
  Representations for Vision, Speech and Language
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLM
SSL
34
92
0
14 Dec 2022
Learning 3D Representations from 2D Pre-trained Models via
  Image-to-Point Masked Autoencoders
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Renrui Zhang
Liuhui Wang
Yu Qiao
Peng Gao
Hongsheng Li
3DPC
41
126
0
13 Dec 2022
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Yunhe Wang
Chang Xu
33
9
0
13 Dec 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models
  of Different Modalities
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
Yudong Li
Cheng-An Hou
Jing-xin Zhao
Rong Tian
...
Xingwu Sun
Zhanhui Kang
Xiaoyong Du
Linlin Shen
Kimmo Yan
VLM
41
23
0
13 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
45
49
0
12 Dec 2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1
  Accuracy with ViT-B and ViT-L on ImageNet
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
22
35
0
12 Dec 2022
BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
  Pre-training in Autonomous Driving Scenarios
BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios
Zhiwei Lin
Yongtao Wang
Shengxiang Qi
Nan Dong
Ming-Hsuan Yang
3DPC
19
13
0
12 Dec 2022
Masked autoencoders are effective solution to transformer data-hungry
Masked autoencoders are effective solution to transformer data-hungry
Jia-ju Mao
Honggu Zhou
Xuesong Yin
Binling Nie
MedIm
37
6
0
12 Dec 2022
SEPT: Towards Scalable and Efficient Visual Pre-Training
SEPT: Towards Scalable and Efficient Visual Pre-Training
Yiqi Lin
Huabin Zheng
Huaping Zhong
Jinjing Zhu
Weijia Li
Conghui He
Lin Wang
38
2
0
11 Dec 2022
Masked Video Distillation: Rethinking Masked Feature Modeling for
  Self-supervised Video Representation Learning
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu-Gang Jiang
VGen
32
87
0
08 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
27
1
0
08 Dec 2022
Occlusion-Robust FAU Recognition by Mining Latent Space of Masked
  Autoencoders
Occlusion-Robust FAU Recognition by Mining Latent Space of Masked Autoencoders
Minyang Jiang
Yongwei Wang
Martin J. McKeown
Jane Wang
CVBM
20
2
0
08 Dec 2022
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
Yue Ma
Tianyu Yang
Yin Shan
Xiu Li
41
27
0
07 Dec 2022
Rethinking the Objectives of Vector-Quantized Tokenizers for Image
  Synthesis
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu
Xintao Wang
Yixiao Ge
Ying Shan
Xiaohu Qie
Mike Zheng Shou
DiffM
32
21
0
06 Dec 2022
Images Speak in Images: A Generalist Painter for In-Context Visual
  Learning
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLM
MLLM
66
245
0
05 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
26
12
0
05 Dec 2022
Exploring Stochastic Autoregressive Image Modeling for Visual
  Representation
Exploring Stochastic Autoregressive Image Modeling for Visual Representation
Yu-Hang Qi
Fan Yang
Yousong Zhu
Yufei Liu
Liwei Wu
Rui Zhao
Wei Li
DiffM
27
13
0
03 Dec 2022
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
Lukas Hoyer
Dengxin Dai
Haoran Wang
Luc Van Gool
52
221
0
02 Dec 2022
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu
Biaolong Chen
Yue Liao
Shuwen Xiao
Wenyu Sun
Xiaobo Li
Yousong Zhu
Jinqiao Wang
Si Liu
CLIP
35
11
0
02 Dec 2022
Scaling Language-Image Pre-training via Masking
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
42
318
0
01 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
32
33
0
01 Dec 2022
SVFormer: Semi-supervised Video Transformer for Action Recognition
SVFormer: Semi-supervised Video Transformer for Action Recognition
Zhen Xing
Qi Dai
Hang-Rui Hu
Jingjing Chen
Zuxuan Wu
Yu-Gang Jiang
ViT
33
69
0
23 Nov 2022
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event
  Classification
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
Sara Atito
Muhammad Awais
Wenwu Wang
Mark D. Plumbley
J. Kittler
ViT
18
9
0
23 Nov 2022
SPCXR: Self-supervised Pretraining using Chest X-rays Towards a Domain
  Specific Foundation Model
SPCXR: Self-supervised Pretraining using Chest X-rays Towards a Domain Specific Foundation Model
Syed Muhammad Anwar
Abhijeet Parida
Sara Atito
Muhammad Awais
G. Nino
Josef Kitler
M. Linguraru
ViT
SSL
OOD
29
6
0
23 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token
  Migration
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
39
6
0
23 Nov 2022
Contrastive Masked Autoencoders for Self-Supervised Video Hashing
Contrastive Masked Autoencoders for Self-Supervised Video Hashing
Yuting Wang
Jinpeng Wang
Bin Chen
Ziyun Zeng
Shutao Xia
29
20
0
21 Nov 2022
Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
  Pre-Training
Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training
Ling Yang
Zhilin Huang
Yang Song
Shenda Hong
Ge Li
Wentao Zhang
Tengjiao Wang
Guohao Li
Ming-Hsuan Yang
33
52
0
21 Nov 2022
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant
  Spatiotemporal Tokens
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
Sun-Kyoo Hwang
Jaehong Yoon
Youngwan Lee
Sung Ju Hwang
33
6
0
19 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo
  Matching and Optical Flow
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
29
83
0
18 Nov 2022
$α$ DARTS Once More: Enhancing Differentiable Architecture Search
  by Masked Image Modeling
ααα DARTS Once More: Enhancing Differentiable Architecture Search by Masked Image Modeling
Bicheng Guo
Shuxuan Guo
Miaojing Shi
Peng Cheng
Shibo He
Jiming Chen
Kaicheng Yu
24
2
0
18 Nov 2022
Weighted Ensemble Self-Supervised Learning
Weighted Ensemble Self-Supervised Learning
Yangjun Ruan
Saurabh Singh
Warren Morningstar
Alexander A. Alemi
Sergey Ioffe
Ian S. Fischer
Joshua V. Dillon
FedML
29
15
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
  Information
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
42
41
0
17 Nov 2022
CAE v2: Context Autoencoder with CLIP Target
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLM
CLIP
50
24
0
17 Nov 2022
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with
  Masked Autoencoders
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
W. G. C. Bandara
Naman Patel
A. Gholami
Mehdi Nikkhah
M. Agrawal
Vishal M. Patel
25
39
0
16 Nov 2022
Stare at What You See: Masked Image Modeling without Reconstruction
Stare at What You See: Masked Image Modeling without Reconstruction
Hongwei Xue
Peng Gao
Hongyang Li
Yu Qiao
Hao Sun
Houqiang Li
Jiebo Luo
25
31
0
16 Nov 2022
Masked Reconstruction Contrastive Learning with Information Bottleneck
  Principle
Masked Reconstruction Contrastive Learning with Information Bottleneck Principle
Ziwen Liu
Bonan li
Congying Han
Tiande Guo
Xuecheng Nie
SSL
34
2
0
15 Nov 2022
Self-supervised remote sensing feature learning: Learning Paradigms,
  Challenges, and Future Works
Self-supervised remote sensing feature learning: Learning Paradigms, Challenges, and Future Works
Chao Tao
Ji Qi
Mingning Guo
Qing Zhu
Haifeng Li
SSL
31
56
0
15 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
89
681
0
14 Nov 2022
Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked
  Modeling for Vision Decoding
Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding
Zijiao Chen
Jiaxin Qing
Tiange Xiang
Wan Lin Yue
J. Zhou
DiffM
MedIm
29
147
0
13 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
MARLIN: Masked Autoencoder for facial video Representation LearnINg
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViT
CVBM
27
60
0
12 Nov 2022
Masked Contrastive Representation Learning
Masked Contrastive Representation Learning
Yuan Yao
Nandakishor Desai
M. Palaniswami
SSL
22
8
0
11 Nov 2022
StyleNAT: Giving Each Head a New Perspective
StyleNAT: Giving Each Head a New Perspective
Steven Walton
Ali Hassani
Xingqian Xu
Zhangyang Wang
Humphrey Shi
ViT
31
23
0
10 Nov 2022
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Qiang Chen
Jian Wang
Chuchu Han
Shangang Zhang
Zexian Li
...
Haocheng Feng
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
ViT
VLM
42
45
0
07 Nov 2022
Previous
123...1314151617
Next