Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.01678
Cited By
MultiMAE: Multi-modal Multi-task Masked Autoencoders
4 April 2022
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MultiMAE: Multi-modal Multi-task Masked Autoencoders"
50 / 194 papers shown
Title
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
Zibin Dong
Fei Ni
Yifu Yuan
Yinchuan Li
Jianye Hao
24
0
0
15 May 2025
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
Xu Zheng
Yuanhuiyi Lyu
Lutao Jiang
Danda Pani Paudel
Luc Van Gool
Xuming Hu
29
0
0
10 May 2025
The Moon's Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction
Tom Sander
Moritz Tenthoff
Kay Wohlfarth
Christian Wöhler
31
0
0
08 May 2025
Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities
Lucas Robinet
Ahmad Berjaoui
Elizabeth Cohen-Jonathan Moyal
26
0
0
01 May 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIP
VLM
76
0
0
30 Apr 2025
Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining
Weizhen He
Yunfeng Yan
Shixiang Tang
Yiheng Deng
Yangyang Zhong
Pengxin Luo
Donglian Qi
VLM
94
1
0
29 Apr 2025
Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation
Duy Nguyen
Quan Huu Do
Khoa D. Doan
Minh N. Do
32
0
0
18 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
X. Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
34
0
0
17 Apr 2025
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus
Carl Doersch
Yi Yang
Skanda Koppula
Viorica Patraucean
Xu He
Ignacio Rocco
Mehdi S. M. Sajjadi
Sarath Chandar
Ross Goroshin
30
0
0
08 Apr 2025
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo Yin
Jiao-Long Cao
Ming-Ming Cheng
Qibin Hou
3DPC
MDE
48
0
0
07 Apr 2025
MMGen: Unified Multi-modal Image Generation and Understanding in One Go
Jiepeng Wang
Zhaoqing Wang
H. Pan
Yuan Liu
Dongdong Yu
Changhu Wang
Wenping Wang
DiffM
78
0
0
26 Mar 2025
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham
Juan C. Caicedo
Bryan A. Plummer
47
0
0
25 Mar 2025
HiRes-FusedMIM: A High-Resolution RGB-DSM Pre-trained Model for Building-Level Remote Sensing Applications
Guneet Mutreja
Philipp Schuegraf
Ksenia Bittner
AI4CE
51
0
0
24 Mar 2025
PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes
Xinhua Xu
Hong Liu
Jianbing Wu
Jinfu Liu
DiffM
59
0
0
24 Mar 2025
Unified Human Localization and Trajectory Prediction with Monocular Vision
Po-Chien Luan
Yang Gao
Celine Demonsant
Alexandre Alahi
36
0
0
05 Mar 2025
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
Ibrahim Fayad
Max Zimmer
Martin Schwartz
P. Ciais
Fabian Gieseke
Gabriel Belouze
Sarah Brood
A. D. Truchis
Alexandre d’Aspremont
AI4TS
43
0
0
24 Feb 2025
Challenges of Multi-Modal Coreset Selection for Depth Prediction
Viktor Moskvoretskii
Narek Alvandian
39
0
0
20 Feb 2025
Toward Foundational Model for Sleep Analysis Using a Multimodal Hybrid Self-Supervised Learning Framework
Cheol-Hui Lee
Hakseung Kim
Byung C. Yoon
Dong-Joo Kim
41
0
0
18 Feb 2025
Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu
Jingyang Zhang
Tian Fang
Jean-Daniel Nahmias
Yanghai Tsin
Long Quan
Xun Cao
Yao Yao
Shiwei Li
114
4
0
11 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
88
1
0
10 Feb 2025
Unity by Diversity: Improved Representation Learning in Multimodal VAEs
Thomas M. Sutter
Yang Meng
Andrea Agostini
Daphné Chopard
Norbert Fortin
Julia E. Vogt
Bahbak Shahbaba
Stephan Mandt
SSL
54
2
0
08 Jan 2025
Geospatial Data Fusion: Combining Lidar, SAR, and Optical Imagery with AI for Enhanced Urban Mapping
Sajjad Afroosheh
Mohammadreza Askari
AI4CE
32
0
0
25 Dec 2024
Sensitive Image Classification by Vision Transformers
Hanxian He
Campbell Wilson
Thanh Thi Nguyen
Janis Dalins
ViT
78
0
0
21 Dec 2024
Cross-View Completion Models are Zero-shot Correspondence Estimators
Honggyu An
J. Kim
Seonghoon Park
Jaewoo Jung
Jisang Han
Sunghwan Hong
Seungryong Kim
3DV
80
3
0
12 Dec 2024
Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization
Maxime Fontana
Michael W. Spratling
Miaojing Shi
MoE
VLM
64
0
0
04 Dec 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Yueru Jia
Jiaming Liu
Sixiang Chen
Chenyang Gu
Z. Wang
...
Lily Lee
Pengwei Wang
Zhongyuan Wang
Renrui Zhang
Shanghang Zhang
87
11
0
27 Nov 2024
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark
Bing Cao
Quanhao Lu
Jiekang Feng
Pengfei Zhu
Q. Hu
Qilong Wang
73
0
0
20 Nov 2024
NeuroNURBS: Learning Efficient Surface Representations for 3D Solids
Jiajie Fan
Babak Gholami
Thomas Bäck
Hao Wang
AI4CE
3DV
28
0
0
16 Nov 2024
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin
Cheng-En Wu
Huanran Li
Jifan Zhang
Yu Hen Hu
Pedro Morgado
36
0
0
16 Nov 2024
Multi-Transmotion: Pre-trained Model for Human Motion Prediction
Yang Gao
Po-Chien Luan
Alexandre Alahi
36
6
0
04 Nov 2024
Disentangling Genotype and Environment Specific Latent Features for Improved Trait Prediction using a Compositional Autoencoder
Anirudha Powadi
Talukder Zaki Jubery
Michael C. Tross
James C. Schnable
Baskar Ganapathysubramanian
CML
CoGe
28
0
0
25 Oct 2024
Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference
Yuta Oshima
Masahiro Suzuki
Y. Matsuo
33
0
0
15 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
24
5
0
10 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu
Honghui Yang
Yating Wang
Jiange Yang
Limin Wang
Tong He
3DH
51
6
0
10 Oct 2024
Analysis of Spatial augmentation in Self-supervised models in the purview of training and test distributions
Abhishek Jha
Tinne Tuytelaars
30
0
0
26 Sep 2024
Learning Representation for Multitask learning through Self Supervised Auxiliary learning
Seokwon Shin
Hyungrok Do
Youngdoo Son
SSL
26
1
0
25 Sep 2024
What to align in multimodal contrastive learning?
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
27
3
0
11 Sep 2024
IVGF: The Fusion-Guided Infrared and Visible General Framework
Fangcen Liu
Chenqiang Gao
Fang Chen
Pengcheng Li
Junjie Guo
Deyu Meng
29
0
0
02 Sep 2024
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
Linyan Yang
Lukas Hoyer
Mark Weber
Tobias Fischer
Dengxin Dai
Laura Leal-Taixé
Marc Pollefeys
Daniel Cremers
Luc Van Gool
MDE
32
3
0
29 Aug 2024
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models
Wentao Wu
Fanghua Hong
Xiao Wang
Chenglong Li
Jin Tang
VLM
54
1
0
23 Aug 2024
Symmetric masking strategy enhances the performance of Masked Image Modeling
Khanh-Binh Nguyen
Chae Jung Park
32
0
0
23 Aug 2024
Depth-guided Texture Diffusion for Image Semantic Segmentation
Wei Sun
Yuan Li
Qixiang Ye
Jianbin Jiao
Yanzhao Zhou
DiffM
MDE
31
0
0
17 Aug 2024
Membership Inference Attack Against Masked Image Modeling
Z. Li
Xinlei He
Ning Yu
Yang Zhang
42
1
0
13 Aug 2024
Multistain Pretraining for Slide Representation Learning in Pathology
Guillaume Jaume
Anurag J. Vaidya
Andrew Zhang
Andrew H. Song
Richard J. Chen
S. Sahai
Dandan Mo
Emilio Madrigal
L. Le
Faisal Mahmood
31
11
0
05 Aug 2024
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue
Anurag Das
Francis Engelmann
Siyu Tang
J. E. Lenssen
46
24
0
29 Jul 2024
Global atmospheric data assimilation with multi-modal masked autoencoders
T. Vandal
Kate Duffy
Daniel J. McDuff
Yoni Nachmany
Chris Hartshorn
AI4Cl
AI4CE
35
2
0
16 Jul 2024
Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining
Qi Zhang
Tianqi Du
Haotian Huang
Yifei Wang
Yisen Wang
34
3
0
01 Jul 2024
3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Shengyi Qian
Kaichun Mo
Valts Blukis
David Fouhey
Dieter Fox
Ankit Goyal
34
2
0
26 Jun 2024
Transferable Tactile Transformers for Representation Learning Across Diverse Sensors and Tasks
Jialiang Zhao
Yuxiang Ma
Lirui Wang
Edward H. Adelson
24
16
0
19 Jun 2024
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
Samar Khanna
Medhanie Irgau
David B. Lobell
Stefano Ermon
VLM
32
4
0
16 Jun 2024
1
2
3
4
Next