ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.01678
  4. Cited By
MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders

4 April 2022
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
ArXivPDFHTML

Papers citing "MultiMAE: Multi-modal Multi-task Masked Autoencoders"

44 / 194 papers shown
Title
Remote Sensing Scene Classification with Masked Image Modeling (MIM)
Remote Sensing Scene Classification with Masked Image Modeling (MIM)
Liya Wang
A. Tien
35
3
0
28 Feb 2023
CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
Jiang Yang
Sheng Guo
Gangshan Wu
Limin Wang
VLM
23
6
0
13 Feb 2023
Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face
  Anti-Spoofing
Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing
Zitong Yu
Rizhao Cai
Yawen Cui
Xin Liu
Yongjian Hu
Alex C. Kot
24
21
0
11 Feb 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Liya Wang
A. Tien
46
7
0
28 Jan 2023
Goal-Guided Transformer-Enabled Reinforcement Learning for Efficient
  Autonomous Navigation
Goal-Guided Transformer-Enabled Reinforcement Learning for Efficient Autonomous Navigation
Wenhui Huang
Yanxin Zhou
Xiangkun He
Chengqi Lv
13
27
0
01 Jan 2023
MaskingDepth: Masked Consistency Regularization for Semi-supervised
  Monocular Depth Estimation
MaskingDepth: Masked Consistency Regularization for Semi-supervised Monocular Depth Estimation
Jongbeom Baek
Gyeongnyeon Kim
Seonghoon Park
Honggyu An
Matteo Poggi
Seung Wook Kim
MDE
37
0
0
21 Dec 2022
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
Simone Klenk
David Bonello
Lukas Koestler
Nikita Araslanov
Daniel Cremers
29
23
0
20 Dec 2022
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
26
51
0
15 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
37
43
0
09 Dec 2022
Deep Architectures for Content Moderation and Movie Content Rating
Deep Architectures for Content Moderation and Movie Content Rating
Fatih Çagatay Akyön
A. Temi̇zel
33
4
0
08 Dec 2022
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
57
311
0
06 Dec 2022
Images Speak in Images: A Generalist Painter for In-Context Visual
  Learning
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLM
MLLM
66
244
0
05 Dec 2022
Scaling Language-Image Pre-training via Masking
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
27
318
0
01 Dec 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video
  Representation Learning
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
19
20
0
25 Nov 2022
Towards Good Practices for Missing Modality Robust Action Recognition
Towards Good Practices for Missing Modality Robust Action Recognition
Sangmin Woo
Sumin Lee
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
22
43
0
25 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo
  Matching and Optical Flow
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
29
83
0
18 Nov 2022
i-MAE: Are Latent Representations in Masked Autoencoders Linearly
  Separable?
i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?
Kevin Zhang
Zhiqiang Shen
20
8
0
20 Oct 2022
MixMask: Revisiting Masking Strategy for Siamese ConvNets
MixMask: Revisiting Masking Strategy for Siamese ConvNets
Kirill Vishniakov
Eric P. Xing
Zhiqiang Shen
18
0
0
20 Oct 2022
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View
  Completion
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Philippe Weinzaepfel
Vincent Leroy
Thomas Lucas
Romain Brégier
Yohann Cabon
Vaibhav Arora
L. Antsfeld
Boris Chidlovskii
G. Csurka
Jérôme Revaud
SSL
42
64
0
19 Oct 2022
Critical Learning Periods for Multisensory Integration in Deep Networks
Critical Learning Periods for Multisensory Integration in Deep Networks
Michael Kleinman
Alessandro Achille
Stefano Soatto
35
10
0
06 Oct 2022
Real-World Robot Learning with Masked Visual Pre-training
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic
Tete Xiao
Stephen James
Pieter Abbeel
Jitendra Malik
Trevor Darrell
SSL
156
241
0
06 Oct 2022
Backdoor Attacks in the Supply Chain of Masked Image Modeling
Backdoor Attacks in the Supply Chain of Masked Image Modeling
Xinyue Shen
Xinlei He
Zheng Li
Yun Shen
Michael Backes
Yang Zhang
46
8
0
04 Oct 2022
Wheel Impact Test by Deep Learning: Prediction of Location and Magnitude
  of Maximum Stress
Wheel Impact Test by Deep Learning: Prediction of Location and Magnitude of Maximum Stress
S. Shin
Ah-hyeon Jin
Soyoung Yoo
Sunghee Lee
Chang-Gone Kim
S. Heo
Namwoo Kang
24
11
0
03 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
35
120
0
02 Oct 2022
ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver
  Distraction Detection
ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection
Yunsheng Ma
Ziran Wang
ViT
41
14
0
19 Sep 2022
Multi-modal Masked Autoencoders Learn Compositional Histopathological
  Representations
Multi-modal Masked Autoencoders Learn Compositional Histopathological Representations
Wisdom O. Ikezogwo
M. S. Seyfioglu
Linda G. Shapiro
11
5
0
04 Sep 2022
Masked Vision and Language Modeling for Multi-modal Representation
  Learning
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
36
67
0
03 Aug 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision
  and Beyond
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
57
71
0
30 Jul 2022
Transfer Learning for Segmentation Problems: Choose the Right Encoder
  and Skip the Decoder
Transfer Learning for Segmentation Problems: Choose the Right Encoder and Skip the Decoder
Jonas Dippel
Matthias Lenga
Thomas Goerttler
Klaus Obermayer
Johannes Höhne
SSL
24
2
0
29 Jul 2022
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral
  Satellite Imagery
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
Yezhen Cong
Samarth Khanna
Chenlin Meng
Patrick Liu
Erik Rozi
Yutong He
Marshall Burke
David B. Lobell
Stefano Ermon
ViT
22
250
0
17 Jul 2022
Masked World Models for Visual Control
Masked World Models for Visual Control
Younggyo Seo
Danijar Hafner
Hao Liu
Fangchen Liu
Stephen James
Kimin Lee
Pieter Abbeel
OffRL
93
147
0
28 Jun 2022
Saccade Mechanisms for Image Classification, Object Detection and
  Tracking
Saccade Mechanisms for Image Classification, Object Detection and Tracking
Saurabh Farkya
Z. Daniels
Aswin Raghavan
David C. Zhang
M. Piacentino
27
3
0
10 Jun 2022
Spatial Entropy as an Inductive Bias for Vision Transformers
Spatial Entropy as an Inductive Bias for Vision Transformers
E. Peruzzo
E. Sangineto
Yahui Liu
Marco De Nadai
Wei Bi
Bruno Lepri
N. Sebe
ViT
MDE
31
1
0
09 Jun 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
47
36
0
01 Jun 2022
GMML is All you Need
GMML is All you Need
Sara Atito
Muhammad Awais
J. Kittler
ViT
VLM
46
18
0
30 May 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for
  Object Detection
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
31
55
0
06 Apr 2022
Weak Augmentation Guided Relational Self-Supervised Learning
Weak Augmentation Guided Relational Self-Supervised Learning
Mingkai Zheng
Shan You
Fei Wang
Chao Qian
Changshui Zhang
Xiaogang Wang
Chang Xu
32
4
0
16 Mar 2022
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with
  Transformers
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
Jiaming Zhang
Huayao Liu
Kailun Yang
Xinxin Hu
Ruiping Liu
Rainer Stiefelhagen
ViT
34
301
0
09 Mar 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
226
226
0
20 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
311
7,457
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
359
5,811
0
29 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
251
577
0
22 Apr 2021
Simple Copy-Paste is a Strong Data Augmentation Method for Instance
  Segmentation
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
Golnaz Ghiasi
Huayu Chen
A. Srinivas
Rui Qian
Nayeon Lee
E. D. Cubuk
Quoc V. Le
Barret Zoph
ISeg
252
969
0
13 Dec 2020
Meta Pseudo Labels
Meta Pseudo Labels
Hieu H. Pham
Zihang Dai
Qizhe Xie
Minh-Thang Luong
Quoc V. Le
VLM
262
656
0
23 Mar 2020
Previous
1234