ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.12681
  4. Cited By
What Makes Training Multi-Modal Classification Networks Hard?

What Makes Training Multi-Modal Classification Networks Hard?

29 May 2019
Weiyao Wang
Du Tran
Matt Feiszli
ArXivPDFHTML

Papers citing "What Makes Training Multi-Modal Classification Networks Hard?"

31 / 81 papers shown
Title
Multi-View Hypercomplex Learning for Breast Cancer Screening
Multi-View Hypercomplex Learning for Breast Cancer Screening
Eleonora Lopez
Eleonora Grassucci
Martina Valleriani
Danilo Comminiello
35
8
0
12 Apr 2022
MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention
  Point Generator
MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention Point Generator
Chang Liu
Xiaoyan Qian
Xiaojuan Qi
E. Lam
Siew-Chong Tan
Ngai Wong
3DPC
31
11
0
29 Mar 2022
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional
  Emotion Recognition
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
R Gnana Praveen
W. Melo
Nasib Ullah
Haseeb Aslam
Osama Zeeshan
...
M. Pedersoli
Alessandro Lameiras Koerich
Simon L Bacon
P. Cardinal
Eric Granger
22
68
0
28 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition
  on Modality-Specific Annotated Videos
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
Saghir Alfasly
Jian Lu
C. Xu
Yuru Zou
42
18
0
06 Mar 2022
Dense Voxel Fusion for 3D Object Detection
Dense Voxel Fusion for 3D Object Detection
Anas Mahmoud
Jordan S. K. Hu
Steven L. Waslander
3DPC
32
45
0
02 Mar 2022
Multi-task UNet: Jointly Boosting Saliency Prediction and Disease
  Classification on Chest X-ray Images
Multi-task UNet: Jointly Boosting Saliency Prediction and Disease Classification on Chest X-ray Images
Hongzhi Zhu
R. Rohling
Septimiu Salcudean
19
4
0
15 Feb 2022
Learning from Temporal Gradient for Semi-supervised Action Recognition
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
12
51
0
25 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
35
73
0
25 Nov 2021
TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial
  Decoding
TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding
Zhengwei Wang
Qi She
A. Smolic
21
9
0
17 Oct 2021
Decoder Fusion RNN: Context and Interaction Aware Decoders for
  Trajectory Prediction
Decoder Fusion RNN: Context and Interaction Aware Decoders for Trajectory Prediction
Edoardo Mello Rella
Jan-Nico Zaech
Alexander Liniger
Luc Van Gool
AI4CE
27
14
0
12 Aug 2021
Multi-modal Residual Perceptron Network for Audio-Video Emotion
  Recognition
Multi-modal Residual Perceptron Network for Audio-Video Emotion Recognition
Xin Chang
W. Skarbek
30
19
0
21 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
32
159
0
15 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
42
543
0
30 Jun 2021
VidHarm: A Clip Based Dataset for Harmful Content Detection
VidHarm: A Clip Based Dataset for Harmful Content Detection
Johan Edstedt
Amanda Berg
M. Felsberg
Johan Karlsson
Francisca Benavente
Anette Novak
G. Pihlgren
28
2
0
15 Jun 2021
A Review on Explainability in Multimodal Deep Neural Nets
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
29
140
0
17 May 2021
Ego-Exo: Transferring Visual Representations from Third-person to
  First-person Videos
Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos
Yanghao Li
Tushar Nagarajan
Bo Xiong
Kristen Grauman
EgoV
51
84
0
16 Apr 2021
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
J. Komorowski
Monika Wysoczanska
Tomasz Trzciñski
24
55
0
12 Apr 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
36
37
0
05 Apr 2021
Offboard 3D Object Detection from Point Cloud Sequences
Offboard 3D Object Detection from Point Cloud Sequences
C. Qi
Yin Zhou
Mahyar Najibi
Pei Sun
Khoa T. Vo
Boyang Deng
Dragomir Anguelov
3DPC
42
175
0
08 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
91
977
0
04 Mar 2021
Trusted Multi-View Classification
Trusted Multi-View Classification
Zongbo Han
Changqing Zhang
Huazhu Fu
Qiufeng Wang
EDL
31
165
0
03 Feb 2021
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing
  Functional Entropies
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
A. Schwing
Tamir Hazan
60
90
0
21 Oct 2020
Audio- and Gaze-driven Facial Animation of Codec Avatars
Audio- and Gaze-driven Facial Animation of Codec Avatars
Alexander Richard
Colin S. Lea
Shugao Ma
Juergen Gall
Fernando de la Torre
Yaser Sheikh
CVBM
21
81
0
11 Aug 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
39
23
0
24 Apr 2020
ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
C. Qi
Xinlei Chen
Or Litany
Leonidas J. Guibas
3DPC
195
249
0
29 Jan 2020
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
29
251
0
10 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
42
428
0
28 Nov 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
59
242
0
06 Sep 2019
Hypothesis Only Baselines in Natural Language Inference
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
190
576
0
02 May 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,465
0
06 Jun 2016
Previous
12