ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViT
    TPM
ArXivPDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,613 papers shown
Title
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
Zixuan Chen
Jing Huo
Yangtao Chen
Yang Gao
43
2
0
11 Jan 2025
AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery
AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery
Johann Wenckstern
Eeshaan Jain
Kiril Vasilev
Matteo Pariset
Andreas Wicki
Gabriele Gut
Charlotte Bunne
36
1
0
10 Jan 2025
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
Samir Sadok
Simon Leglaive
Laurent Girin
Gaël Richard
Xavier Alameda-Pineda
55
1
0
10 Jan 2025
EditAR: Unified Conditional Generation with Autoregressive Models
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu
Nuno Vasconcelos
Xinyu Wang
DiffM
43
5
0
08 Jan 2025
Edit as You See: Image-guided Video Editing via Masked Motion Modeling
Edit as You See: Image-guided Video Editing via Masked Motion Modeling
Zhi-Lin Huang
Y. Liu
Chujun Qin
Zihan Wang
Dong Zhou
Dong Li
E. Barsoum
DiffM
VGen
46
0
0
08 Jan 2025
Learning Informative Latent Representation for Quantum State Tomography
Learning Informative Latent Representation for Quantum State Tomography
Hailan Ma
Zhenhong Sun
Daoyi Dong
Dong Gong
50
1
0
08 Jan 2025
Abstracted Shapes as Tokens -- A Generalizable and Interpretable Model for Time-series Classification
Abstracted Shapes as Tokens -- A Generalizable and Interpretable Model for Time-series Classification
Yunshi Wen
Tengfei Ma
Tsui-Wei Weng
Lam M. Nguyen
A. Julius
AI4TS
42
1
0
08 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLM
VLM
71
28
0
07 Jan 2025
Gaussian Masked Autoencoders
Gaussian Masked Autoencoders
Jathushan Rajasegaran
Xinlei Chen
Rulilong Li
Christoph Feichtenhofer
Jitendra Malik
Shiry Ginosar
3DGS
45
1
0
06 Jan 2025
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling
Junmyeong Lee
Eui Jun Hwang
Sukmin Cho
Jong C. Park
43
0
0
06 Jan 2025
Human Gaze Boosts Object-Centered Representation Learning
Timothy Schaumlöffel
A. Aubret
Gemma Roig
Jochen Triesch
36
0
0
06 Jan 2025
First-place Solution for Streetscape Shop Sign Recognition Competition
First-place Solution for Streetscape Shop Sign Recognition Competition
Bin Wang
Li Jing
169
0
0
06 Jan 2025
LWFNet: Coherent Doppler Wind Lidar-Based Network for Wind Field Retrieval
R. Tao
Chong Wang
Hao Chen
Mingjiao Jia
Xiang Shang
...
Yanyu Lu
Yanfeng Huo
Junlin Wu
Xianghui Xue
Xiankang Dou
38
0
0
05 Jan 2025
DeTrack: In-model Latent Denoising Learning for Visual Object Tracking
Xinyu Zhou
Jinglun Li
Lingyi Hong
Kaixun Jiang
Pinxue Guo
Weifeng Ge
Wenqiang Zhang
DiffM
38
0
0
05 Jan 2025
Enhancing Contrastive Learning for Retinal Imaging via Adjusted Augmentation Scales
Zijie Cheng
Yangqiu Song
André Altmann
P. Keane
Yukun Zhou
MedIm
31
0
0
05 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
132
3
0
05 Jan 2025
Double-Flow GAN model for the reconstruction of perceived faces from brain activities
Double-Flow GAN model for the reconstruction of perceived faces from brain activities
Zihao Wang
Jing Zhao
Xuetong Ding
Hui Zhang
CVBM
AI4CE
24
0
0
03 Jan 2025
Exploring Structured Semantic Priors Underlying Diffusion Score for Test-time Adaptation
Mingjia Li
Shuang Li
Tongrui Su
Longhui Yuan
Jian Liang
Wei Li
DiffM
39
0
0
03 Jan 2025
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Lijie Tao
H. Zhang
Haizhao Jing
Yu Liu
Kelu Yao
Guoting Wei
Xizhe Xue
37
0
0
03 Jan 2025
Missing Data as Augmentation in the Earth Observation Domain: A Multi-View Learning Approach
Francisco Mena
Diego Arenas
A. Dengel
36
1
0
03 Jan 2025
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Yifan Zhang
Junhui Hou
66
1
0
03 Jan 2025
Neural Network Diffusion
Neural Network Diffusion
Kaili Wang
Dongwen Tang
Boya Zeng
Yida Yin
Zhaopan Xu
Yukun Zhou
Zelin Zang
Trevor Darrell
Zhuang Liu
Yang You
DiffM
60
18
0
03 Jan 2025
Keypoint Aware Masked Image Modelling
Keypoint Aware Masked Image Modelling
Madhava Krishna
Convin.AI
73
0
0
03 Jan 2025
Is Segment Anything Model 2 All You Need for Surgery Video Segmentation? A Systematic Evaluation
Cheng Yuan
Jian Jiang
Kunyi Yang
Lv Wu
Rui Wang
...
Yifan Zhou
Wanli Song
Haoran Wang
Qi Dou
Yutong Ban
28
1
0
03 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
69
24
0
31 Dec 2024
ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
Ting Zhang
Zhiqiang Yuan
Yeshuang Zhu
Jinchao Zhang
DiffM
101
0
0
31 Dec 2024
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
S. Oota
Zijiao Chen
Manish Gupta
R. Bapi
G. Jobard
F. Alexandre
X. Hinaut
3DV
AI4CE
49
11
0
31 Dec 2024
SwinIA: Self-Supervised Blind-Spot Image Denoising without Convolutions
SwinIA: Self-Supervised Blind-Spot Image Denoising without Convolutions
Mikhail Papkov
P. Chizhov
L. Parts
ViT
47
1
0
31 Dec 2024
Enhancing Visual Representation for Text-based Person Searching
Enhancing Visual Representation for Text-based Person Searching
Wei Shen
Ming Fang
Yuxia Wang
Jiafeng Xiao
Diping Li
H. Chen
Ling Xu
Wenbo Zhang
37
1
0
31 Dec 2024
Segmentation of Muscularis Propria in Colon Histopathology Images Using Vision Transformers for Hirschsprung's Disease
Segmentation of Muscularis Propria in Colon Histopathology Images Using Vision Transformers for Hirschsprung's Disease
Youssef Megahed
Anthony Fuller
Saleh Abou-Alwan
Dina El Demellawy
Adrian D. C. Chan
MedIm
23
0
0
31 Dec 2024
TravelAgent: Generative Agents in the Built Environment
TravelAgent: Generative Agents in the Built Environment
Ariel Noyman
Kai Hu
Kent Larson
AI4CE
42
2
0
25 Dec 2024
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked
  Autoencoder Learning
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
Shentong Mo
42
0
0
23 Dec 2024
VarAD: Lightweight High-Resolution Image Anomaly Detection via Visual
  Autoregressive Modeling
VarAD: Lightweight High-Resolution Image Anomaly Detection via Visual Autoregressive Modeling
Yunkang Cao
Haiming Yao
Wei Luo
Nong Sang
43
3
0
23 Dec 2024
Adaptive Dataset Quantization
Adaptive Dataset Quantization
Muquan Li
Dongyang Zhang
Qiang Dong
Xiurui Xie
Ke Qin
DD
MQ
88
0
0
22 Dec 2024
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees
Zehong Wang
Zheyuan Zhang
Tianyi Ma
Nitesh V. Chawla
Chuxu Zhang
Yanfang Ye
AI4CE
79
0
0
21 Dec 2024
IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks
IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks
Yaming Zhang
Chenqiang Gao
Fangcen Liu
Junjie Guo
Lan Wang
Xinggan Peng
Deyu Meng
106
0
0
21 Dec 2024
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level
  Vision-Language Alignment
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Cijo Jose
Théo Moutakanni
Dahyun Kang
Federico Baldassarre
Timothée Darcet
...
Maxime Oquab
Oriane Siméoni
Huy V. Vo
Patrick Labatut
Piotr Bojanowski
CLIP
VLM
100
6
0
20 Dec 2024
LEARN: A Unified Framework for Multi-Task Domain Adapt Few-Shot Learning
LEARN: A Unified Framework for Multi-Task Domain Adapt Few-Shot Learning
Bharadwaj Ravichandran
Alexander Lynch
S. Brockman
Brandon RichardWebster
Dawei Du
A. Hoogs
Christopher Funk
ObjD
VLM
73
0
0
20 Dec 2024
Scaling 4D Representations
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
85
3
0
19 Dec 2024
Predictive Inverse Dynamics Models are Scalable Learners for Robotic
  Manipulation
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Yang Tian
Sizhe Yang
Jia Zeng
P. Wang
Dahua Lin
Hao Dong
Jiangmiao Pang
84
17
0
19 Dec 2024
Thinking in Space: How Multimodal Large Language Models See, Remember,
  and Recall Spaces
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
124
55
0
18 Dec 2024
Mesoscopic Insights: Orchestrating Multi-scale & Hybrid Architecture for
  Image Manipulation Localization
Mesoscopic Insights: Orchestrating Multi-scale & Hybrid Architecture for Image Manipulation Localization
Xuekang Zhu
Xiaochen Ma
Lei Su
Zhuohang Jiang
Bo Du
Xiwen Wang
Zeyu Lei
Wentao Feng
Chi-Man Pun
Jizhe Zhou
AI4CE
62
4
0
18 Dec 2024
Self-control: A Better Conditional Mechanism for Masked Autoregressive
  Model
Self-control: A Better Conditional Mechanism for Masked Autoregressive Model
Qiaoying Qu
Shiyu Shen
DiffM
81
0
0
18 Dec 2024
PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale
  Traffic Forecasting
PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale Traffic Forecasting
Tongtong Zhang
Zhiyong Cui
Bingzhang Wang
Yilong Ren
Haiyang Yu
Pan Deng
Yinhai Wang
AI4TS
70
0
0
18 Dec 2024
Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D
  Human Pose Estimation
Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation
Xiaoqi An
Lin Zhao
Chen Gong
Jun Yu Li
Jian Yang
3DH
82
0
0
18 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
76
0
0
18 Dec 2024
Content-aware Balanced Spectrum Encoding in Masked Modeling for Time
  Series Classification
Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification
Yudong Han
Haocong Wang
Yupeng Hu
Yongshun Gong
Xuemeng Song
Weili Guan
AI4TS
84
0
0
17 Dec 2024
Efficient Object-centric Representation Learning with Pre-trained
  Geometric Prior
Efficient Object-centric Representation Learning with Pre-trained Geometric Prior
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
OCL
84
0
0
16 Dec 2024
SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
S. Nagendra
Kashif Rashid
Chaopeng Shen
Daniel Kifer
VLM
76
2
0
16 Dec 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
M. Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Zeang Sheng
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
91
7
0
16 Dec 2024
Previous
123...91011...919293
Next