ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.12681
  4. Cited By
What Makes Training Multi-Modal Classification Networks Hard?

What Makes Training Multi-Modal Classification Networks Hard?

29 May 2019
Weiyao Wang
Du Tran
Matt Feiszli
ArXivPDFHTML

Papers citing "What Makes Training Multi-Modal Classification Networks Hard?"

50 / 79 papers shown
Title
TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
Feng Liu
Ziwang Fu
Yansen Wang
Qijian Zheng
40
4
0
10 May 2025
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
Junehyoung Kwon
Mihyeon Kim
Eunju Lee
Juhwan Choi
Youngbin Kim
60
0
0
18 Mar 2025
DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
Chengxuan Qian
Kai Han
J. Wang
Zhenlong Yuan
Rui Qian
Chongwen Lyu
Jun Chen
54
1
0
09 Mar 2025
FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning
FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning
Jason Jingzhou Liu
Yulong Li
Kenneth Shaw
Tony Tao
Ruslan Salakhutdinov
Deepak Pathak
OffRL
65
1
0
24 Feb 2025
A Self-supervised Multimodal Deep Learning Approach to Differentiate Post-radiotherapy Progression from Pseudoprogression in Glioblastoma
A Self-supervised Multimodal Deep Learning Approach to Differentiate Post-radiotherapy Progression from Pseudoprogression in Glioblastoma
A. Gomaa
Yixing Huang
Pluvio Stephan
Katharina Breininger
Benjamin Frey
...
U. Gaipl
Christoph Bert
R. Fietkau
M. Schmidt
F. Putz
89
0
0
06 Feb 2025
Enhancing Scene Classification in Cloudy Image Scenarios: A Collaborative Transfer Method with Information Regulation Mechanism using Optical Cloud-Covered and SAR Remote Sensing Images
Enhancing Scene Classification in Cloudy Image Scenarios: A Collaborative Transfer Method with Information Regulation Mechanism using Optical Cloud-Covered and SAR Remote Sensing Images
Yuze Wang
Rong Xiao
Haifeng Li
Mariana Belgiu
Chao Tao
36
0
0
08 Jan 2025
Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion
Dong Zhang
Kwang-Ting Cheng
MedIm
22
0
0
03 Jan 2025
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin
Oh Hyun-Bin
JungMok Lee
Arda Senocak
Joon Son Chung
Tae-Hyun Oh
MLLM
VLM
48
3
0
23 Oct 2024
Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records
Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records
Shuai Jiang
Christina Robinson
Joseph Anderson
William Hisey
Lynn Butterly
A. Suriawinata
Saeed Hassanpour
19
0
0
13 Oct 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Zae Myung Kim
Dongyeop Kang
Yao-Yi Chiang
Alfred Hero
25
0
0
02 Oct 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal
  Learning with Missing Modalities and Data Scarcity
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
36
2
0
14 Mar 2024
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using
  transformers
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers
James Gunn
Zygmunt Lenyk
Anuj Sharma
Andrea Donati
Alexandru Buburuzan
John Redford
Romain Mueller
MDE
38
8
0
22 Dec 2023
Modality Mixer Exploiting Complementary Information for Multi-modal
  Action Recognition
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Muhammad Adi Nugroho
Changick Kim
30
0
0
21 Nov 2023
Improving Discriminative Multi-Modal Learning with Large-Scale
  Pre-Trained Models
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models
Chenzhuang Du
Yue Zhao
Chonghua Liao
Jiacheng You
Jie Fu
Hang Zhao
39
2
0
08 Oct 2023
Audio-Visual Speaker Verification via Joint Cross-Attention
Audio-Visual Speaker Verification via Joint Cross-Attention
R Gnana Praveen
Jahangir Alam
34
6
0
28 Sep 2023
Interpretation on Multi-modal Visual Fusion
Interpretation on Multi-modal Visual Fusion
Hao Chen
Hao Zhou
Yongjian Deng
36
0
0
19 Aug 2023
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Hong Li
Xingyu Li
Pengbo Hu
Yinuo Lei
Chunxiao Li
Yi Zhou
42
20
0
15 Aug 2023
MultiWave: Multiresolution Deep Architectures through Wavelet
  Decomposition for Multivariate Time Series Prediction
MultiWave: Multiresolution Deep Architectures through Wavelet Decomposition for Multivariate Time Series Prediction
I. Deznabi
M. Fiterau
AI4TS
35
5
0
16 Jun 2023
Continual Multimodal Knowledge Graph Construction
Continual Multimodal Knowledge Graph Construction
Xiang Chen
Jintian Zhang
Xiaohan Wang
Ningyu Zhang
Tongtong Wu
Luo Si
Yongheng Wang
Huajun Chen
KELM
CLL
27
14
0
15 May 2023
Patchwork Learning: A Paradigm Towards Integrative Analysis across
  Diverse Biomedical Data Sources
Patchwork Learning: A Paradigm Towards Integrative Analysis across Diverse Biomedical Data Sources
Suraj Rajendran
Weishen Pan
M. Sabuncu
Yong Chen
Jiayu Zhou
Fei Wang
57
14
0
10 May 2023
Radar-Camera Fusion for Object Detection and Semantic Segmentation in
  Autonomous Driving: A Comprehensive Review
Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review
Shanliang Yao
Runwei Guan
Xiaoyu Huang
Zhuoxiao Li
Xiangyu Sha
...
Eng Gee Lim
H. Seo
Ka Lok Man
Xiaohui Zhu
Yutao Yue
41
91
0
20 Apr 2023
Multimodal Hyperspectral Image Classification via Interconnected Fusion
Multimodal Hyperspectral Image Classification via Interconnected Fusion
Lu Huo
Jiahao Xia
Leijie Zhang
Haimin Zhang
Min Xu
30
2
0
02 Apr 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
  Tasks
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
24
38
0
04 Mar 2023
Balanced Audiovisual Dataset for Imbalance Analysis
Balanced Audiovisual Dataset for Imbalance Analysis
Wenke Xia
Xu Zhao
Xincheng Pang
Changqing Zhang
Di Hu
37
1
0
14 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
34
1
0
07 Feb 2023
Rethinking Soft Label in Label Distribution Learning Perspective
Rethinking Soft Label in Label Distribution Learning Perspective
Seungbum Hong
Jihun Yoon
Bogyu Park
Min-Kook Choi
31
0
0
31 Jan 2023
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance
  Industry
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry
Azin Asgarian
Rohit Saha
Daniel Jakubovitz
Julia Peyre
29
2
0
15 Jan 2023
Toward Building General Foundation Models for Language, Vision, and
  Vision-Language Understanding Tasks
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLM
AI4CE
LRM
19
17
0
12 Jan 2023
A Survey on Human Action Recognition
A Survey on Human Action Recognition
Zhou Shuchang
29
0
0
20 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
37
43
0
09 Dec 2022
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive
  Survey
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Yuecong Xu
Haozhi Cao
Zhenghua Chen
Xiaoli Li
Lihua Xie
Jianfei Yang
24
14
0
17 Nov 2022
PMR: Prototypical Modal Rebalance for Multimodal Learning
PMR: Prototypical Modal Rebalance for Multimodal Learning
Yunfeng Fan
Wenchao Xu
Yining Qi
Junxiao Wang
Song Guo
25
62
0
14 Nov 2022
MarginNCE: Robust Sound Localization with a Negative Margin
MarginNCE: Robust Sound Localization with a Negative Margin
Sooyoung Park
Arda Senocak
Joon Son Chung
SSL
14
13
0
03 Nov 2022
CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for
  Robust 3D Object Detection
CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection
Jyh-Jing Hwang
Henrik Kretzschmar
Joshua M. Manela
Sean M. Rafferty
N. Armstrong-Crews
Tiffany Chen
Drago Anguelov
3DPC
25
48
0
17 Oct 2022
Critical Learning Periods for Multisensory Integration in Deep Networks
Critical Learning Periods for Multisensory Integration in Deep Networks
Michael Kleinman
Alessandro Achille
Stefano Soatto
35
10
0
06 Oct 2022
Uncertainty Estimation for Multi-view Data: The Power of Seeing the
  Whole Picture
Uncertainty Estimation for Multi-view Data: The Power of Seeing the Whole Picture
M. Jung
He Zhao
Joanna Dipnall
Belinda Gabbe
Lan Du
UQCV
EDL
57
12
0
06 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
35
120
0
02 Oct 2022
Multimodal Analogical Reasoning over Knowledge Graphs
Multimodal Analogical Reasoning over Knowledge Graphs
Ningyu Zhang
Lei Li
Xiang Chen
Xiaozhuan Liang
Shumin Deng
Huajun Chen
54
26
0
01 Oct 2022
Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space
  Using Joint Cross-Attention
Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
R Gnana Praveen
Eric Granger
P. Cardinal
CVBM
53
31
0
19 Sep 2022
DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality
  Attention
DM2^22S2^22: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention
Shunsuke Kitada
Yuki Iwazaki
Riku Togashi
Hitoshi Iyatomi
21
1
0
07 Sep 2022
Progressive Fusion for Multimodal Integration
Progressive Fusion for Multimodal Integration
Shiv Shankar
Laure Thompson
M. Fiterau
31
3
0
01 Sep 2022
Modality Mixer for Multi-modal Action Recognition
Modality Mixer for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
26
10
0
24 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
30
21
0
29 Jul 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and
  Applications
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
28
52
0
02 Jun 2022
Structured Attention Composition for Temporal Action Localization
Structured Attention Composition for Temporal Action Localization
Le Yang
Junwei Han
Tao Zhao
Nian Liu
Dingwen Zhang
37
17
0
20 May 2022
SHAPE: An Unified Approach to Evaluate the Contribution and Cooperation
  of Individual Modalities
SHAPE: An Unified Approach to Evaluate the Contribution and Cooperation of Individual Modalities
Pengbo Hu
Xingyu Li
Yi Zhou
30
10
0
30 Apr 2022
Trusted Multi-View Classification with Dynamic Evidential Fusion
Trusted Multi-View Classification with Dynamic Evidential Fusion
Zongbo Han
Changqing Zhang
Huazhu Fu
Qiufeng Wang
EDL
28
219
0
25 Apr 2022
Multi-View Hypercomplex Learning for Breast Cancer Screening
Multi-View Hypercomplex Learning for Breast Cancer Screening
Eleonora Lopez
Eleonora Grassucci
Martina Valleriani
Danilo Comminiello
30
8
0
12 Apr 2022
12
Next