Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.12681
Cited By
What Makes Training Multi-Modal Classification Networks Hard?
29 May 2019
Weiyao Wang
Du Tran
Matt Feiszli
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Makes Training Multi-Modal Classification Networks Hard?"
31 / 81 papers shown
Title
Multi-View Hypercomplex Learning for Breast Cancer Screening
Eleonora Lopez
Eleonora Grassucci
Martina Valleriani
Danilo Comminiello
35
8
0
12 Apr 2022
MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention Point Generator
Chang Liu
Xiaoyan Qian
Xiaojuan Qi
E. Lam
Siew-Chong Tan
Ngai Wong
3DPC
31
11
0
29 Mar 2022
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
R Gnana Praveen
W. Melo
Nasib Ullah
Haseeb Aslam
Osama Zeeshan
...
M. Pedersoli
Alessandro Lameiras Koerich
Simon L Bacon
P. Cardinal
Eric Granger
22
68
0
28 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
Saghir Alfasly
Jian Lu
C. Xu
Yuru Zou
42
18
0
06 Mar 2022
Dense Voxel Fusion for 3D Object Detection
Anas Mahmoud
Jordan S. K. Hu
Steven L. Waslander
3DPC
32
45
0
02 Mar 2022
Multi-task UNet: Jointly Boosting Saliency Prediction and Disease Classification on Chest X-ray Images
Hongzhi Zhu
R. Rohling
Septimiu Salcudean
19
4
0
15 Feb 2022
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
12
51
0
25 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
35
73
0
25 Nov 2021
TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding
Zhengwei Wang
Qi She
A. Smolic
21
9
0
17 Oct 2021
Decoder Fusion RNN: Context and Interaction Aware Decoders for Trajectory Prediction
Edoardo Mello Rella
Jan-Nico Zaech
Alexander Liniger
Luc Van Gool
AI4CE
27
14
0
12 Aug 2021
Multi-modal Residual Perceptron Network for Audio-Video Emotion Recognition
Xin Chang
W. Skarbek
30
19
0
21 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
32
159
0
15 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
42
543
0
30 Jun 2021
VidHarm: A Clip Based Dataset for Harmful Content Detection
Johan Edstedt
Amanda Berg
M. Felsberg
Johan Karlsson
Francisca Benavente
Anette Novak
G. Pihlgren
28
2
0
15 Jun 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
29
140
0
17 May 2021
Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos
Yanghao Li
Tushar Nagarajan
Bo Xiong
Kristen Grauman
EgoV
51
84
0
16 Apr 2021
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
J. Komorowski
Monika Wysoczanska
Tomasz Trzciñski
24
55
0
12 Apr 2021
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
36
37
0
05 Apr 2021
Offboard 3D Object Detection from Point Cloud Sequences
C. Qi
Yin Zhou
Mahyar Najibi
Pei Sun
Khoa T. Vo
Boyang Deng
Dragomir Anguelov
3DPC
42
175
0
08 Mar 2021
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
91
977
0
04 Mar 2021
Trusted Multi-View Classification
Zongbo Han
Changqing Zhang
Huazhu Fu
Qiufeng Wang
EDL
31
165
0
03 Feb 2021
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
A. Schwing
Tamir Hazan
60
90
0
21 Oct 2020
Audio- and Gaze-driven Facial Animation of Codec Avatars
Alexander Richard
Colin S. Lea
Shugao Ma
Juergen Gall
Fernando de la Torre
Yaser Sheikh
CVBM
21
81
0
11 Aug 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
39
23
0
24 Apr 2020
ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
C. Qi
Xinlei Chen
Or Litany
Leonidas J. Guibas
3DPC
195
249
0
29 Jan 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
29
251
0
10 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
42
428
0
28 Nov 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
59
242
0
06 Sep 2019
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
190
576
0
02 May 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,465
0
06 Jun 2016
Previous
1
2