Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.06383
Cited By
What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?
10 October 2023
Siting Li
Chenzhuang Du
Yue Zhao
Yu Huang
Hang Zhao
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?"
30 / 30 papers shown
Title
Are Multimodal Transformers Robust to Missing Modality?
Mengmeng Ma
Jian Ren
Long Zhao
Davide Testuggine
Xi Peng
ViT
93
154
0
12 Apr 2022
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Xiaokang Peng
Yake Wei
Andong Deng
Dong Wang
Di Hu
69
212
0
29 Mar 2022
Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)
Yu Huang
Junyang Lin
Chang Zhou
Hongxia Yang
Longbo Huang
60
96
0
23 Mar 2022
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
91
170
0
15 Jul 2021
What Makes Multi-modal Learning Better than Single (Provably)
Yu Huang
Chenzhuang Du
Zihui Xue
Xuanyao Chen
Hang Zhao
Longbo Huang
87
265
0
08 Jun 2021
SMIL: Multimodal Learning with Severely Missing Modality
Mengmeng Ma
Jian Ren
Long Zhao
Sergey Tulyakov
Cathy H. Wu
Xi Peng
98
262
0
09 Mar 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
415
4,953
0
24 Feb 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLM
CLIP
123
1,749
0
05 Feb 2021
Trusted Multi-View Classification
Zongbo Han
Changqing Zhang
Huazhu Fu
Qiufeng Wang
EDL
51
171
0
03 Feb 2021
Contrastive learning, multi-view redundancy, and linear models
Christopher Tosh
A. Krishnamurthy
Daniel J. Hsu
SSL
77
166
0
24 Aug 2020
TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning
Xinwei Sun
Yilun Xu
Peng Cao
Yuqing Kong
Lingjing Hu
Shan Zhang
Yizhou Wang
60
21
0
14 Jul 2020
Multi-Domain Image Completion for Random Missing Input Data
Liyue Shen
Wentao Zhu
Xiaosong Wang
Lei Xing
John M. Pauly
...
Thomas Sanford
Sherif Mehralivand
Peter L. Choyke
Bradford J. Wood
Daguang Xu
MedIm
64
70
0
10 Jul 2020
Investigating Vulnerability to Adversarial Examples on Multimodal Data Fusion in Deep Learning
Youngjoon Yu
Hong Joo Lee
Byeong Cheon Kim
Jung Uk Kim
Yong Man Ro
AAML
76
18
0
22 May 2020
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
83
252
0
10 Dec 2019
Out-of-distribution Detection in Classifiers via Generation
Sachin Vernekar
Ashish Gaurav
Vahdat Abdelzad
Taylor Denouden
Rick Salay
Krzysztof Czarnecki
OODD
78
83
0
09 Oct 2019
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
Kexin Yi
Yuta Saito
Yunzhu Li
Pushmeet Kohli
Jiajun Wu
Antonio Torralba
J. Tenenbaum
NAI
121
473
0
03 Oct 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
140
247
0
06 Sep 2019
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
169
2,403
0
13 Jun 2019
CentralNet: a Multilayer Approach for Multimodal Fusion
Valentin Vielzeuf
Alexis Lechervy
S. Pateux
F. Jurie
77
171
0
22 Aug 2018
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
102
536
0
09 Apr 2018
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Jiuxiang Gu
Jianfei Cai
Shafiq Joty
Li Niu
G. Wang
VLM
68
361
0
17 Nov 2017
VIGAN: Missing View Imputation with Generative Adversarial Networks
Chao Shang
A. Palmer
Jiangwen Sun
Ko-Shin Chen
Jin Lu
J. Bi
GAN
45
122
0
22 Aug 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
104
2,932
0
26 May 2017
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
115
905
0
23 May 2017
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
90
381
0
07 Feb 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
304
2,378
0
20 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
345
3,246
0
02 Dec 2016
Convolutional Two-Stream Network Fusion for Video Action Recognition
Christoph Feichtenhofer
A. Pinz
Andrew Zisserman
163
2,611
0
22 Apr 2016
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
211
5,478
0
03 May 2015
A Survey on Multi-view Learning
Chang Xu
Dacheng Tao
Chao Xu
AI4TS
110
1,129
0
20 Apr 2013
1