Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.01678
Cited By
MultiMAE: Multi-modal Multi-task Masked Autoencoders
4 April 2022
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MultiMAE: Multi-modal Multi-task Masked Autoencoders"
50 / 194 papers shown
Title
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Vincent Leroy
Jérôme Revaud
Thomas Lucas
Philippe Weinzaepfel
ViT
42
2
0
01 Oct 2023
XVO: Generalized Visual Odometry via Cross-Modal Self-Training
Tohida Rehman
Ronit Mandal
Jimuyang Zhang
Debarshi Kumar Sanyal
SSL
33
17
0
28 Sep 2023
CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding
Mingming Zhang
Qingjie Liu
Yunhong Wang
32
5
0
28 Sep 2023
Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback
Teresa Yeo
Oğuzhan Fatih Kar
Zahra Sodagar
Amir Zamir
TTA
OOD
31
3
0
27 Sep 2023
M
3
^{3}
3
3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
24
1
0
26 Sep 2023
MUTEX: Learning Unified Policies from Multimodal Task Specifications
Rutav Shah
Roberto Martín-Martín
Yuke Zhu
OffRL
44
54
0
25 Sep 2023
AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation
S. Du
Weixi Wang
R. Guo
Ruisheng Wang
Yibin Tian
Shengjun Tang
24
13
0
25 Sep 2023
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Bo Yin
Xuying Zhang
Zhongyu Li
Li Liu
Ming-Ming Cheng
Qibin Hou
24
43
0
18 Sep 2023
Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals
Ran Liu
Ellen L. Zippi
Hadi Pouransari
Chris Sandino
Jingping Nie
Hanlin Goh
Erdrin Azemi
Ali Moin
39
12
0
12 Sep 2023
3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation
Sungjun Cho
Dae-Woong Jeong
Sung Moon Ko
Jinwoo Kim
Sehui Han
Seunghoon Hong
Honglak Lee
Moontae Lee
AI4CE
DiffM
38
1
0
08 Sep 2023
Language-Conditioned Path Planning
Amber Xie
Youngwoon Lee
Pieter Abbeel
Stephen James
LM&Ro
33
14
0
31 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
60
9
0
23 Aug 2023
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
Chenxin Xu
R. Tan
Yuhong Tan
Siheng Chen
Xinchao Wang
Yanfeng Wang
3DH
51
20
0
17 Aug 2023
CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking
Shohreh Deldari
Dimitris Spathis
Mohammad Malekzadeh
F. Kawsar
Flora D. Salim
Akhil Mathur
27
17
0
31 Jul 2023
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
Hanrong Ye
Dan Xu
MoE
42
26
0
28 Jul 2023
Visual Prompt Flexible-Modal Face Anti-Spoofing
Zitong Yu
Rizhao Cai
Yawen Cui
Ajian Liu
Changsheng Chen
38
6
0
26 Jul 2023
When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review
Maxime Fontana
Michael W. Spratling
Miaojing Shi
50
6
0
25 Jul 2023
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
24
4
0
24 Jul 2023
Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection
Guangzhi Wang
Yangyang Guo
Mohan S. Kankanhalli
28
0
0
19 Jul 2023
Deficiency-Aware Masked Transformer for Video Inpainting
Yongsheng Yu
Hengrui Fan
Libo Zhang
VGen
28
9
0
17 Jul 2023
Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training
Yao Wei
Yanchao Sun
Ruijie Zheng
Sai H. Vemprala
Rogerio Bonatti
Shuhang Chen
Ratnesh Madaan
Zhongjie Ba
Ashish Kapoor
Shuang Ma
OffRL
30
15
0
16 Jul 2023
General-Purpose Multimodal Transformer meets Remote Sensing Semantic Segmentation
Nhi Kieu
Kien Nguyen
Sridha Sridharan
Clinton Fookes
ViT
38
3
0
07 Jul 2023
MIMIC: Masked Image Modeling with Image Correspondences
Kalyani Marathe
Mahtab Bigverdi
Nishat Khan
Tuhin Kundu
Patrick Howe
Sharan Ranjit S
Anand Bhattad
Aniruddha Kembhavi
Linda G. Shapiro
Ranjay Krishna
27
0
0
27 Jun 2023
Task-Robust Pre-Training for Worst-Case Downstream Adaptation
Jianghui Wang
Cheng Yang
Xingyu Xie
Cong Fang
Zhouchen Lin
OOD
32
0
0
21 Jun 2023
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
Nicolae-Cătălin Ristea
Florinel-Alin Croitoru
Radu Tudor Ionescu
Marius Popescu
Fahad Shahbaz Khan
M. Shah
ViT
26
20
0
21 Jun 2023
Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection
Sayantan Das
Mojtaba Kolahdouzi
Levent Özparlak
Will Hickie
Ali Etemad
ViT
CVBM
18
3
0
12 Jun 2023
R-MAE: Regions Meet Masked Autoencoders
Duy-Kien Nguyen
Vaibhav Aggarwal
Yanghao Li
Martin R. Oswald
Alexander Kirillov
Cees G. M. Snoek
Xinlei Chen
TPM
34
11
0
08 Jun 2023
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding
Hanrong Ye
Dan Xu
ViT
29
10
0
08 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
28
7
0
01 Jun 2023
Comparative Analysis of Deep Learning Models for Brand Logo Classification in Real-World Scenarios
Qimao Yang
Huilin Chen
Qiwei Dong
VLM
24
0
0
20 May 2023
Artificial intelligence to advance Earth observation: a perspective
D. Tuia
Konrad Schindler
Begüm Demir
Gustau Camps-Valls
Xiao Xiang Zhu
...
Mihai Datcu
Jorge-Arnulfo Quiané-Ruiz
Volker Markl
Bertrand Le Saux
Rochelle Schneider
31
10
0
15 May 2023
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
39
850
0
09 May 2023
A multimodal dynamical variational autoencoder for audiovisual speech representation learning
Samir Sadok
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
Renaud Séguier
28
11
0
05 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
81
6
0
05 May 2023
Modality-invariant Visual Odometry for Embodied Vision
Marius Memmel
Roman Bachmann
Amir Zamir
54
8
0
29 Apr 2023
Incomplete Multimodal Learning for Remote Sensing Data Fusion
Yuxing Chen
Maofan Zhao
Lorenzo Bruzzone
37
3
0
22 Apr 2023
Hard Patches Mining for Masked Image Modeling
Haochen Wang
Kaiyou Song
Junsong Fan
Yuxi Wang
Jin Xie
Zhaoxiang Zhang
37
59
0
12 Apr 2023
Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation
Hanrong Ye
Dan Xu
3DV
21
0
0
03 Apr 2023
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
I. Dave
Mamshad Nayeem Rizve
Chong Chen
M. Shah
TTA
44
16
0
28 Mar 2023
Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder
Tao Sun
Lu Pang
Chao Chen
Haibin Ling
AAML
43
9
0
27 Mar 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
47
11
0
21 Mar 2023
GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
Jihao Liu
Tai Wang
Boxiao Liu
Qihang Zhang
Yu Liu
Hongsheng Li
38
16
0
20 Mar 2023
Efficient Computation Sharing for Multi-Task Visual Scene Understanding
Sara Shoouri
Mingyu Yang
Zichen Fan
Hun-Seok Kim
MoE
26
3
0
16 Mar 2023
Identifiability Results for Multimodal Contrastive Learning
Imant Daunhawer
Alice Bizeul
Emanuele Palumbo
Alexander Marx
Julia E. Vogt
37
38
0
16 Mar 2023
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
Anthony Chen
Kevin Zhang
Renrui Zhang
Zihan Wang
Yuheng Lu
Yandong Guo
Shanghang Zhang
3DPC
70
60
0
14 Mar 2023
Traj-MAE: Masked Autoencoders for Trajectory Prediction
Hao Chen
Jiaze Wang
Kun Shao
Furui Liu
Jianye Hao
Chenyong Guan
Guangyong Chen
Pheng-Ann Heng
60
38
0
12 Mar 2023
Towards General Purpose Medical AI: Continual Learning Medical Foundation Model
Huahui Yi
Ziyuan Qin
Qicheng Lao
Wei Xu
Zekun Jiang
Dequan Wang
Shaoting Zhang
Kang Li
OOD
MedIm
CLL
22
11
0
12 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
44
21
0
04 Mar 2023
Delivering Arbitrary-Modal Semantic Segmentation
Jiaming Zhang
R. Liu
Haowen Shi
Kailun Yang
Simon Reiß
Kunyu Peng
Haodong Fu
Kaiwei Wang
Rainer Stiefelhagen
VLM
51
88
0
02 Mar 2023
Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
Ji Hou
Xiaoliang Dai
Zijian He
Angela Dai
Matthias Nießner
ViT
3DPC
32
16
0
28 Feb 2023
Previous
1
2
3
4
Next