ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.01678
  4. Cited By
MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders

4 April 2022
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
ArXivPDFHTML

Papers citing "MultiMAE: Multi-modal Multi-task Masked Autoencoders"

50 / 194 papers shown
Title
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Roman Bachmann
Oğuzhan Fatih Kar
David Mizrahi
Ali Garjani
Mingfei Gao
David Griffiths
Jiaming Hu
Afshin Dehghan
Amir Zamir
MoE
VLM
MLLM
41
14
0
13 Jun 2024
A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing
  pre-training method based on anchor-aware masked autoencoder
A2^{2}2-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder
Lixian Zhang
Yi Zhao
Runmin Dong
Jinxiao Zhang
Shuai Yuan
...
Weijia Li
Wei Liu
Wayne Zhang
Xue Jiang
Haohuan Fu
44
4
0
12 Jun 2024
Interpetable Target-Feature Aggregation for Multi-Task Learning based on
  Bias-Variance Analysis
Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis
Paolo Bonetti
Alberto Maria Metelli
Marcello Restelli
33
0
0
12 Jun 2024
The Evolution of Multimodal Model Architectures
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Aman Chadha
Eugenio Culurciello
43
14
0
28 May 2024
Mitigating Noisy Correspondence by Geometrical Structure Consistency
  Learning
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao
Mengxi Chen
Tianjie Dai
Jiangchao Yao
Bo han
Ya Zhang
Yanfeng Wang
NoLa
44
3
0
27 May 2024
HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution
HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution
S. Chu
Zhi-chao Dou
Jeng-Shyang Pan
Shaowei Weng
Junbao Li
ViT
38
4
0
08 May 2024
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial
  Representation Learning
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
Vishal Nedungadi
A. Kariryaa
Stefan Oehmcke
Serge Belongie
Christian Igel
Nico Lang
45
25
0
04 May 2024
Protein Representation Learning by Capturing Protein
  Sequence-Structure-Function Relationship
Protein Representation Learning by Capturing Protein Sequence-Structure-Function Relationship
Eunji Ko
Seul Lee
Minseon Kim
Dongki Kim
31
0
0
29 Apr 2024
Masked Autoencoders for Microscopy are Scalable Learners of Cellular
  Biology
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton A. Earnshaw
37
26
0
16 Apr 2024
HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid,
  Asymmetric, and Progressive Heterogeneous Feature Fusion
HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion
Jiahang Li
Peng Yun
Qijun Chen
Rui Fan
38
8
0
04 Apr 2024
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han
Shuai Zhang
Xingjian Shi
Markus Reichstein
31
22
0
01 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal
  Representation Learning for AD classification
CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification
Guangqian Yang
Kangrui Du
Zhihan Yang
Ye Du
Yongping Zheng
Shujun Wang
42
16
0
25 Mar 2024
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from
  Partially Annotated Data
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
Hanrong Ye
Dan Xu
DiffM
60
4
0
22 Mar 2024
Multiple-Input Auto-Encoder Guided Feature Selection for IoT Intrusion
  Detection Systems
Multiple-Input Auto-Encoder Guided Feature Selection for IoT Intrusion Detection Systems
Phai Vu Dinh
Diep N. Nguyen
D. Hoang
Nguyen Quang Uy
E. Dutkiewicz
Son Pham Bao
24
1
0
22 Mar 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
55
12
0
20 Mar 2024
ViSaRL: Visual Reinforcement Learning Guided by Human Saliency
ViSaRL: Visual Reinforcement Learning Guided by Human Saliency
Anthony Liang
Jesse Thomason
Erdem Biyik
38
7
0
16 Mar 2024
A Novel Framework for Multi-Person Temporal Gaze Following and Social
  Gaze Prediction
A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction
Anshul Gupta
Samy Tafasca
Arya Farkhondeh
Pierre Vuillecard
J. Odobez
29
2
0
15 Mar 2024
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient
  Vision Transformers
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
ViT
45
7
0
15 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
42
10
0
08 Mar 2024
MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training
  with Masked Autoencoder
MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder
Lei Li
Tianfang Zhang
Xinglin Zhang
Jiaqi Liu
Bingqi Ma
Yan-chun Luo
Tao Chen
MedIm
40
0
0
07 Mar 2024
The Common Stability Mechanism behind most Self-Supervised Learning
  Approaches
The Common Stability Mechanism behind most Self-Supervised Learning Approaches
Abhishek Jha
Matthew B. Blaschko
Yuki M. Asano
Tinne Tuytelaars
SSL
35
1
0
22 Feb 2024
A Touch, Vision, and Language Dataset for Multimodal Alignment
A Touch, Vision, and Language Dataset for Multimodal Alignment
Letian Fu
Gaurav Datta
Huang Huang
Will Panitch
Jaimyn Drake
Joseph Ortiz
Mustafa Mukadam
Mike Lambeta
Roberto Calandra
Ken Goldberg
VLM
27
32
0
20 Feb 2024
Multiple Random Masking Autoencoder Ensembles for Robust Multimodal
  Semi-supervised Learning
Multiple Random Masking Autoencoder Ensembles for Robust Multimodal Semi-supervised Learning
Alexandru-Raul Todoran
Marius Leordeanu
28
0
0
12 Feb 2024
Point Cloud Matters: Rethinking the Impact of Different Observation
  Spaces on Robot Learning
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Haoyi Zhu
Yating Wang
Di Huang
Weicai Ye
Wanli Ouyang
Tong He
SSL
3DPC
51
20
0
04 Feb 2024
Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation
Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation
Ruiping Liu
Jiaming Zhang
Kunyu Peng
Yufan Chen
Ke Cao
Junwei Zheng
M. Sarfraz
Kailun Yang
Rainer Stiefelhagen
VLM
42
8
0
30 Jan 2024
Rethinking Patch Dependence for Masked Autoencoders
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
34
14
0
25 Jan 2024
Fus-MAE: A cross-attention-based data fusion approach for Masked Autoencoders in remote sensing
Fus-MAE: A cross-attention-based data fusion approach for Masked Autoencoders in remote sensing
Hugo Chan-To-Hing
B. Veeravalli
30
8
0
05 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
39
14
0
31 Dec 2023
Segment Any Events via Weighted Adaptation of Pivotal Tokens
Segment Any Events via Weighted Adaptation of Pivotal Tokens
Zhiwen Chen
Zhiyu Zhu
Yifan Zhang
Junhui Hou
Guangming Shi
Jinjian Wu
31
6
0
24 Dec 2023
Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained
  Locality Learning Matters in Consistency Regularization
Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained Locality Learning Matters in Consistency Regularization
W. Pan
Zhe Xu
Jiangpeng Yan
Zihan Wu
R. Tong
Xiu Li
Jianhua Yao
ISeg
28
1
0
14 Dec 2023
PAD: Self-Supervised Pre-Training with Patchwise-Scale Adapter for
  Infrared Images
PAD: Self-Supervised Pre-Training with Patchwise-Scale Adapter for Infrared Images
Tao Zhang
Kun Ding
Jinyong Wen
Yu Xiong
Zeyu Zhang
Shiming Xiang
Chunhong Pan
30
3
0
13 Dec 2023
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
50
63
0
11 Dec 2023
Sense, Predict, Adapt, Repeat: A Blueprint for Design of New Adaptive
  AI-Centric Sensing Systems
Sense, Predict, Adapt, Repeat: A Blueprint for Design of New Adaptive AI-Centric Sensing Systems
S. Hor
Amin Arbabian
27
1
0
11 Dec 2023
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Yizhou Wang
YiXuan Wu
Shixiang Tang
Weizhen He
Xun Guo
...
Lei Bai
Rui Zhao
Jian Wu
Tong He
Wanli Ouyang
VLM
44
14
0
04 Dec 2023
Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning
Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning
Shaohua Dong
Yunhe Feng
Qing Yang
Yan Huang
Dongfang Liu
Heng Fan
VLM
43
18
0
01 Dec 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with
  Semantic Vector-Quantized Tokenizer
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
38
0
0
28 Nov 2023
ViT-Lens: Towards Omni-modal Representations
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
Towards Transferable Multi-modal Perception Representation Learning for
  Autonomy: NeRF-Supervised Masked AutoEncoder
Towards Transferable Multi-modal Perception Representation Learning for Autonomy: NeRF-Supervised Masked AutoEncoder
Xiaohao Xu
38
0
0
23 Nov 2023
Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal
  Diseases in Ultra-wide OCTA
Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal Diseases in Ultra-wide OCTA
Hao Wei
Peilun Shi
Guitao Bai
Minqing Zhang
Shuangle Li
Wu Yuan
29
0
0
17 Nov 2023
PolyMaX: General Dense Prediction with Mask Transformer
PolyMaX: General Dense Prediction with Mask Transformer
Xuan S. Yang
Liangzhe Yuan
Kimberly Wilber
Astuti Sharma
Xiuye Gu
...
Stephanie Debats
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Liang-Chieh Chen
28
14
0
09 Nov 2023
Learning Discriminative Features for Crowd Counting
Learning Discriminative Features for Crowd Counting
Yuehai Chen
Qingzhong Wang
Jing Yang
Badong Chen
Haoyi Xiong
Shaoyi Du
32
6
0
08 Nov 2023
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species
  Classification and Mapping
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
S. Sastry
Subash Khanal
A. Dhakal
Di Huang
Nathan Jacobs
40
9
0
29 Oct 2023
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
Sudeep Dasari
Mohan Kumar Srirama
Unnat Jain
Abhinav Gupta
SSL
34
34
0
13 Oct 2023
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang
Sha Zhang
Di Huang
Xiaoyang Wu
Haoyi Zhu
...
Hengshuang Zhao
Qibo Qiu
Binbin Lin
Xiaofei He
Wanli Ouyang
SSL
39
44
0
12 Oct 2023
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
Haoyi Zhu
Honghui Yang
Xiaoyang Wu
Di Huang
Sha Zhang
...
Hengshuang Zhao
Chunhua Shen
Yu Qiao
Tong He
Wanli Ouyang
SSL
74
43
0
12 Oct 2023
Pre-Trained Masked Image Model for Mobile Robot Navigation
Pre-Trained Masked Image Model for Mobile Robot Navigation
V. Sharma
Anukriti Singh
Pratap Tokekar
29
0
0
10 Oct 2023
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
  Learning
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning
Yinda Chen
Wei Huang
Shenglong Zhou
Qi Chen
Zhiwei Xiong
28
25
0
06 Oct 2023
Robust Multimodal Learning with Missing Modalities via
  Parameter-Efficient Adaptation
Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation
Md Kaykobad Reza
Ashley Prater-Bennette
M. Salman Asif
28
5
0
06 Oct 2023
Sharingan: A Transformer-based Architecture for Gaze Following
Sharingan: A Transformer-based Architecture for Gaze Following
Samy Tafasca
Anshul Gupta
J. Odobez
ViT
24
3
0
01 Oct 2023
Previous
1234
Next