ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09886
  4. Cited By
SimMIM: A Simple Framework for Masked Image Modeling

SimMIM: A Simple Framework for Masked Image Modeling

18 November 2021
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Qi Dai
Han Hu
ArXivPDFHTML

Papers citing "SimMIM: A Simple Framework for Masked Image Modeling"

50 / 849 papers shown
Title
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
31
0
0
13 May 2025
Thoughts on Objectives of Sparse and Hierarchical Masked Image Model
Thoughts on Objectives of Sparse and Hierarchical Masked Image Model
Asahi Miyazaki
Tsuyoshi Okita
22
0
0
12 May 2025
Joint Low-level and High-level Textual Representation Learning with Multiple Masking Strategies
Joint Low-level and High-level Textual Representation Learning with Multiple Masking Strategies
Zhengmi Tang
Yuto Mitsui
Tomo Miyazaki
S. Omachi
34
0
0
11 May 2025
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
Yicheng Song
Tiancheng Lin
Die Peng
Su Yang
Yi Xu
MedIm
31
0
0
10 May 2025
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi
Eilif Muller
Shahab Bakhtiari
49
0
0
06 May 2025
Dual-Domain Masked Image Modeling: A Self-Supervised Pretraining Strategy Using Spatial and Frequency Domain Masking for Hyperspectral Data
Dual-Domain Masked Image Modeling: A Self-Supervised Pretraining Strategy Using Spatial and Frequency Domain Masking for Hyperspectral Data
Shaheer Mohamed
Tharindu Fernando
Sridha Sridharan
Peyman Moghadam
Clinton Fookes
36
0
0
06 May 2025
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
Junrong Yue
Yuhang Zhang
Chuan Qin
Jing Chen
Xiaomin Lie
Xinlei Yu
Wenxin Zhang
Zhendong Zhao
54
0
0
23 Apr 2025
SignX: The Foundation Model for Sign Recognition
SignX: The Foundation Model for Sign Recognition
Sen Fang
Chunyu Sui
Hongwei Yi
C. Neidle
Dimitris N. Metaxas
SLR
40
0
0
22 Apr 2025
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DiffM
38
0
0
22 Apr 2025
OPO: Making Decision-Focused Data Acquisition Decisions
OPO: Making Decision-Focused Data Acquisition Decisions
Egon Persak
Miguel F. Anjos
27
0
0
21 Apr 2025
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue
Yulin Wang
Chenxin Tao
Pan Liu
Shiji Song
Gao Huang
MedIm
26
0
0
18 Apr 2025
Efficient Masked Image Compression with Position-Indexed Self-Attention
Efficient Masked Image Compression with Position-Indexed Self-Attention
Chengjie Dai
Tiantian Song
Hui Tang
Fangdong Chen
Bowei Yang
Guanghua Song
33
0
0
17 Apr 2025
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Shumin Wang
Zhuoran Yang
L. Wang
ZhiPeng Tang
Heng Li
Lehan Pan
Sha Zhang
Jie Peng
Jianmin Ji
Y. Zhang
3DPC
46
0
0
17 Apr 2025
Can Masked Autoencoders Also Listen to Birds?
Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch
Ilyass Moummad
René Heinrich
Alexis Joly
Bernhard Sick
Christoph Scholz
29
0
0
17 Apr 2025
A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning
A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning
M. D. Wang
Hanbo Bi
Yingchao Feng
Linlin Xin
Shuo Gong
Tianqi Wang
Zhiyuan Yan
Peijin Wang
Wenhui Diao
Xian Sun
36
0
0
16 Apr 2025
Search is All You Need for Few-shot Anomaly Detection
Search is All You Need for Few-shot Anomaly Detection
Qishan Wang
Jia Guo
Shuyong Gao
H. Wang
Li Xiong
J. Hu
Hanqi Guo
Wenqiang Zhang
53
0
0
16 Apr 2025
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
Xiangyue Zhang
Jianfang Li
Jiaxu Zhang
Jianqiang Ren
Liefeng Bo
Zhigang Tu
30
0
0
12 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
49
0
0
12 Apr 2025
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla
Christian Stippel
Leon Sick
SSL
3DPC
79
0
0
09 Apr 2025
A Survey of Pathology Foundation Model: Progress and Future Directions
A Survey of Pathology Foundation Model: Progress and Future Directions
Conghao Xiong
Hao Chen
Joseph J. Y. Sung
LM&MA
AI4CE
53
0
0
05 Apr 2025
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
Xiao-Hui Li
Fei Yin
Cheng-Lin Liu
29
0
0
05 Apr 2025
MIMRS: A Survey on Masked Image Modeling in Remote Sensing
MIMRS: A Survey on Masked Image Modeling in Remote Sensing
Shabnam Choudhury
Akhil Vasim
Michael Schmitt
Biplab Banerjee
38
0
0
04 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
L. Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
48
0
0
01 Apr 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Chenkai Zhang
Yiming Lei
Zeming Liu
Qingjie Liu
Yixuan Wang
46
0
0
28 Mar 2025
MedCL: Learning Consistent Anatomy Distribution for Scribble-supervised Medical Image Segmentation
MedCL: Learning Consistent Anatomy Distribution for Scribble-supervised Medical Image Segmentation
Ke Zhang
Vishal M. Patel
49
0
0
28 Mar 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng
Ziqi Huang
Hongbo Liu
Kai Zou
Yinan He
...
Yuyao Zhang
Jingwen He
Wei-Shi Zheng
Yu Qiao
Ziwei Liu
EGVM
VGen
48
5
0
27 Mar 2025
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham
Juan C. Caicedo
Bryan A. Plummer
47
0
0
25 Mar 2025
HiRes-FusedMIM: A High-Resolution RGB-DSM Pre-trained Model for Building-Level Remote Sensing Applications
HiRes-FusedMIM: A High-Resolution RGB-DSM Pre-trained Model for Building-Level Remote Sensing Applications
Guneet Mutreja
Philipp Schuegraf
Ksenia Bittner
AI4CE
56
0
0
24 Mar 2025
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
Maochen Yang
Zekun Li
Jian Zhang
Lei Qi
Yinghuan Shi
41
0
0
23 Mar 2025
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li
Qi Ma
Runyi Yang
Huapeng Li
Mengjiao Ma
...
E. Konukoglu
Theo Gevers
Luc Van Gool
Martin R. Oswald
Danda Pani Paudel
3DGS
VLM
76
0
0
23 Mar 2025
Should we pre-train a decoder in contrastive learning for dense prediction tasks?
Should we pre-train a decoder in contrastive learning for dense prediction tasks?
S. Quetin
Tapotosh Ghosh
Farhad Maleki
SSL
74
0
0
21 Mar 2025
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang
Jing Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Gaopeng Gou
Qi Wu
VGen
51
1
0
21 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
53
1
0
21 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
59
0
0
20 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
P. Radeva
54
0
0
19 Mar 2025
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
J. Wang
Weiwei Song
Hao Chen
J. Ren
Huimin Zhao
62
0
0
18 Mar 2025
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Joona Kareinen
T. Eerola
K. Kraft
L. Lensu
S. Suikkanen
Heikki Kälviäinen
SSL
174
0
0
14 Mar 2025
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu
Lei Fan
M. Pagnucco
Yang Song
41
0
0
13 Mar 2025
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Fengxiang Wang
H. Wang
Yixuan Wang
Di Wang
Mingshuo Chen
...
Yangang Sun
Shuo Wang
L. Lan
Wenjing Yang
Jing Zhang
Mamba
80
3
0
13 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
46
0
0
10 Mar 2025
M3^33amba: CLIP-driven Mamba Model for Multi-modal Remote Sensing Classification
Mingxiang Cao
Weiying Xie
Xin Zhang
Jiaqing Zhang
Kai Jiang
Jie Lei
Yunsong Li
Mamba
48
0
0
09 Mar 2025
CLICv2: Image Complexity Representation via Content Invariance Contrastive Learning
Shipeng Liu
Liang Zhao
Dengfeng Chen
SSL
101
0
0
09 Mar 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
65
0
0
08 Mar 2025
Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation
Wenzhao Xiang
Chang Liu
Hongyang Yu
Xilin Chen
36
0
0
02 Mar 2025
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Carlos Vélez García
Miguel Cazorla
Jorge Pomares
54
0
0
25 Feb 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
61
43
0
24 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
71
8
0
24 Feb 2025
Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Yongqi Dong
Xingmin Lu
Ruohan Li
Wei Song
B. Arem
Haneen Farah
ViT
107
1
0
21 Feb 2025
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni
Yuxin Guo
Yichen Liu
Rui Chen
Lewei Lu
Z. Wu
DiffM
VGen
73
3
0
17 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
88
1
0
10 Feb 2025
1234...151617
Next