ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey

Multimodal Learning with Transformers: A Survey

13 June 2022
P. Xu
Xiatian Zhu
David A. Clifton
    ViT
ArXivPDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 268 papers shown
Title
Fine-Grained Prediction of Reading Comprehension from Eye Movements
Fine-Grained Prediction of Reading Comprehension from Eye Movements
Omer Shubi
Yoav Meiri
Cfir Avraham Hadar
Yevgeni Berzak
37
3
0
06 Oct 2024
MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
Niki Nezakati
Md Kaykobad Reza
Ameya Patil
Mashhour Solh
M. Salman Asif
29
1
0
03 Oct 2024
Multi-modal Cross-domain Self-supervised Pre-training for fMRI and EEG
  Fusion
Multi-modal Cross-domain Self-supervised Pre-training for fMRI and EEG Fusion
Xinxu Wei
K. Zhao
Yong Jiao
Nancy B. Carlisle
Hua Xie
Gregory A. Fonzo
Yu Zhang
28
0
0
27 Sep 2024
A Multimodal Single-Branch Embedding Network for Recommendation in
  Cold-Start and Missing Modality Scenarios
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios
Christian Ganhor
Marta Moscati
Anna Hausberger
Shah Nawaz
Markus Schedl
37
2
0
26 Sep 2024
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual
  Captioning of Human Movement Trajectories
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories
Hikaru Asano
Ryo Yonetani
Taiki Sekii
Hiroki Ouchi
74
0
0
19 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple
  Operators for Forecasting Fluid Dynamics
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
43
6
0
15 Sep 2024
Integration of Mamba and Transformer -- MAT for Long-Short Range Time
  Series Forecasting with Application to Weather Dynamics
Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics
Wenqing Zhang
Junming Huang
Ruotong Wang
Changsong Wei
Wenqian Huang
Yuxin Qiao
Mamba
32
10
0
13 Sep 2024
What to align in multimodal contrastive learning?
What to align in multimodal contrastive learning?
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
29
3
0
11 Sep 2024
ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression
  of Temporal and Spatial Redundancies in Point Cloud Transformers
ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers
Luoyu Mei
Shuai Wang
Yun Cheng
Ruofeng Liu
Zhimeng Yin
Wenchao Jiang
Shuai Wang
Wei Gong
30
5
0
02 Sep 2024
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural
  Language Description
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description
Zeyu Jin
Jia Jia
Qixin Wang
Kehan Li
Shuoyi Zhou
Songtao Zhou
Xiaoyu Qin
Zhiyong Wu
29
10
0
24 Aug 2024
Modality Invariant Multimodal Learning to Handle Missing Modalities: A
  Single-Branch Approach
Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach
Muhammad Saad Saeed
Shah Nawaz
Muhammad Zaigham Zaheer
Muhammad Haris Khan
Karthik Nandakumar
Muhammad Haroon Yousaf
Hassan Sajjad
Tom De Schepper
Markus Schedl
30
0
0
14 Aug 2024
Enhancing Visual Question Answering through Ranking-Based Hybrid
  Training and Multimodal Fusion
Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion
Peiyuan Chen
Zecheng Zhang
Yiping Dong
Li Zhou
Han Wang
35
12
0
14 Aug 2024
Swarm-Net: Firmware Attestation in IoT Swarms using Graph Neural
  Networks and Volatile Memory
Swarm-Net: Firmware Attestation in IoT Swarms using Graph Neural Networks and Volatile Memory
Varun Kohli
Bhavya Kohli
M. Aman
Biplab Sikdar
24
0
0
11 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
35
0
0
08 Aug 2024
MoExtend: Tuning New Experts for Modality and Task Extension
MoExtend: Tuning New Experts for Modality and Task Extension
Shanshan Zhong
Shanghua Gao
Zhongzhan Huang
Wushao Wen
Marinka Zitnik
Pan Zhou
VLM
MLLM
MoE
56
6
0
07 Aug 2024
A Systematic Review of Intermediate Fusion in Multimodal Deep Learning
  for Biomedical Applications
A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications
V. Guarrasi
Fatih Aksu
Camillo Maria Caruso
Francesco Di Feola
Aurora Rofena
Filippo Ruffini
Paolo Soda
OffRL
MedIm
AI4CE
20
12
0
02 Aug 2024
HyperMM : Robust Multimodal Learning with Varying-sized Inputs
HyperMM : Robust Multimodal Learning with Varying-sized Inputs
Hava Chaptoukaev
Vincenzo Marcianó
Francesco Galati
Maria A. Zuluaga
32
0
0
30 Jul 2024
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
Aashish Rai
Srinath Sridhar
DiffM
40
4
0
30 Jul 2024
DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion
  Models
DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models
Jing Yang
Runping Xi
Yingxin Lai
Xun Lin
Zitong Yu
DiffM
34
1
0
29 Jul 2024
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons
  of Vision Language Models
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models
Xinyu Pi
Mingyuan Wu
Jize Jiang
Haozhen Zheng
Beitong Tian
Chengxiang Zhai
Klara Nahrstedt
Zhiting Hu
VLM
36
1
0
25 Jul 2024
Chameleon: Images Are What You Need For Multimodal Learning Robust To
  Missing Modalities
Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities
Muhammad Irzam Liaqat
Shah Nawaz
Muhammad Zaigham Zaheer
M. S. Saeed
Hassan Sajjad
Tom De Schepper
Karthik Nandakumar
Muhammad Haris Khan
21
1
0
23 Jul 2024
Resource-Efficient Federated Multimodal Learning via Layer-wise and
  Progressive Training
Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training
Ye Lin Tun
Chu Myaet Thwal
Minh N. H. Nguyen
Choong Seon Hong
40
0
0
22 Jul 2024
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Daoyuan Chen
Haibin Wang
Yilun Huang
Ce Ge
Yaliang Li
Bolin Ding
Jingren Zhou
VLM
SyDa
63
0
0
16 Jul 2024
Diagnosing and Re-learning for Balanced Multimodal Learning
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei
Siwei Li
Ruoxuan Feng
Di Hu
33
3
0
12 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
57
5
0
11 Jul 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
40
2
0
07 Jul 2024
Multimodal Classification via Modal-Aware Interactive Enhancement
Multimodal Classification via Modal-Aware Interactive Enhancement
Qing-Yuan Jiang
Zhouyang Chi
Yang Yang
33
3
0
05 Jul 2024
Adaptive Modality Balanced Online Knowledge Distillation for
  Brain-Eye-Computer based Dim Object Detection
Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection
Zixing Li
Chao Yan
Zhen Lan
Xiaojia Xiang
Han Zhou
Jun Lai
Dengqing Tang
43
0
0
02 Jul 2024
Assistive Image Annotation Systems with Deep Learning and Natural
  Language Capabilities: A Review
Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review
Moseli Motsóehli
VLM
3DV
30
0
0
28 Jun 2024
Multimodal Prototyping for cancer survival prediction
Multimodal Prototyping for cancer survival prediction
Andrew H. Song
Richard J. Chen
Guillaume Jaume
Anurag J. Vaidya
Alexander S. Baras
Faisal Mahmood
24
12
0
28 Jun 2024
Structured Unrestricted-Rank Matrices for Parameter Efficient
  Fine-tuning
Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning
Arijit Sehanobish
Avinava Dubey
Krzysztof Choromanski
Somnath Basu Roy Chowdhury
Deepali Jain
Vikas Sindhwani
Snigdha Chaturvedi
ALM
43
1
0
25 Jun 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
33
1
0
24 Jun 2024
In-Context In-Context Learning with Transformer Neural Processes
In-Context In-Context Learning with Transformer Neural Processes
Matthew Ashman
Cristiana-Diana Diaconu
Adrian Weller
Richard E. Turner
26
3
0
19 Jun 2024
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory
  Utilization for Hybrid CPU-GPU Offloaded Optimizers
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
23
2
0
15 Jun 2024
Improving Large Models with Small models: Lower Costs and Better
  Performance
Improving Large Models with Small models: Lower Costs and Better Performance
Dong Chen
Shuo Zhang
Yueting Zhuang
Siliang Tang
Qidong Liu
Hua Wang
Mingliang Xu
39
4
0
15 Jun 2024
MoME: Mixture of Multimodal Experts for Cancer Survival Prediction
MoME: Mixture of Multimodal Experts for Cancer Survival Prediction
Conghao Xiong
Hao Chen
Hao Zheng
Dong Wei
Yefeng Zheng
Joseph J. Y. Sung
Irwin King
MoE
29
10
0
14 Jun 2024
Recent Advances in Federated Learning Driven Large Language Models: A Survey on Architecture, Performance, and Security
Recent Advances in Federated Learning Driven Large Language Models: A Survey on Architecture, Performance, and Security
Youyang Qu
Ming Liu
Tianqing Zhu
Longxiang Gao
Shui Yu
Wanlei Zhou
MU
FedML
65
2
0
14 Jun 2024
Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting
  Process: Methodology and Benchmark
Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark
Gaochang Wu
Yapeng Zhang
Lan Deng
Jingxin Zhang
Tianyou Chai
41
6
0
13 Jun 2024
Labeling Comic Mischief Content in Online Videos with a Multimodal
  Hierarchical-Cross-Attention Model
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Elaheh Baharlouei
Mahsa Shafaei
Yigeng Zhang
Hugo Jair Escalante
Thamar Solorio
38
0
0
12 Jun 2024
UEMM-Air: Make Unmanned Aerial Vehicles Perform More Multi-modal Tasks
UEMM-Air: Make Unmanned Aerial Vehicles Perform More Multi-modal Tasks
Liang Yao
Liang Yao
Shengxiang Xu
Chuanyi Zhang
Xinlei Zhang
Ting Wu
Zequan Wang
Shimin Di
Jun Zhou
39
0
0
10 Jun 2024
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
Matthew Fortier
Mats L. Richter
O. Sonnentag
Chris Pal
AI4CE
26
0
0
07 Jun 2024
ArMeme: Propagandistic Content in Arabic Memes
ArMeme: Propagandistic Content in Arabic Memes
Firoj Alam
A. Hasnat
Fatema Ahmed
Md. Arid Hasan
Maram Hasanain
48
7
0
06 Jun 2024
MiniGPT-Reverse-Designing: Predicting Image Adjustments Utilizing
  MiniGPT-4
MiniGPT-Reverse-Designing: Predicting Image Adjustments Utilizing MiniGPT-4
Vahid Azizi
Fatemeh Koochaki
VLM
48
0
0
03 Jun 2024
Robust Multi-Modal Speech In-Painting: A Sequence-to-Sequence Approach
Robust Multi-Modal Speech In-Painting: A Sequence-to-Sequence Approach
Mahsa Kadkhodaei Elyaderani
Shahram Shirani
26
0
0
02 Jun 2024
From Words to Actions: Unveiling the Theoretical Underpinnings of
  LLM-Driven Autonomous Systems
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems
Jianliang He
Siyu Chen
Fengzhuo Zhang
Zhuoran Yang
LM&Ro
LLMAG
44
2
0
30 May 2024
The Evolution of Multimodal Model Architectures
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Aman Chadha
Eugenio Culurciello
43
14
0
28 May 2024
Mitigating Noisy Correspondence by Geometrical Structure Consistency
  Learning
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao
Mengxi Chen
Tianjie Dai
Jiangchao Yao
Bo han
Ya-Qin Zhang
Yanfeng Wang
NoLa
44
3
0
27 May 2024
ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive
  Learning for Multi-Modal 3D Object Detection
ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection
Ziying Song
Feiyang Jia
Hongyu Pan
Yadan Luo
Caiyan Jia
Guoxin Zhang
Lin Liu
Yang Ji
Lei Yang
Li-e Wang
39
9
0
27 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep
  neural networks
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
Lotem Elber-Dorozko
AI4CE
73
3
0
24 May 2024
Transformers for Image-Goal Navigation
Transformers for Image-Goal Navigation
Nikhilanj Pelluri
ViT
32
0
0
23 May 2024
Previous
123456
Next