ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06488
  4. Cited By
Multimodal Learning with Transformers: A Survey

Multimodal Learning with Transformers: A Survey

13 June 2022
P. Xu
Xiatian Zhu
David A. Clifton
    ViT
ArXivPDFHTML

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 268 papers shown
Title
Vision-Language Instruction Tuning: A Review and Analysis
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
36
12
0
14 Nov 2023
Which One? Leveraging Context Between Objects and Multiple Views for
  Language Grounding
Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
Chancharik Mitra
Abrar Anwar
Rodolfo Corona
Dan Klein
Trevor Darrell
Jesse Thomason
21
1
0
12 Nov 2023
Conceptual Model Interpreter for Large Language Models
Conceptual Model Interpreter for Large Language Models
Felix Härer
26
7
0
11 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
29
64
0
07 Nov 2023
Dynamic Multimodal Information Bottleneck for Multimodality
  Classification
Dynamic Multimodal Information Bottleneck for Multimodality Classification
Yingying Fang
Shuang Wu
Sheng Zhang
Chao Huang
Tieyong Zeng
Xiaodan Xing
Simon Walsh
Guang Yang
27
7
0
02 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
28
63
0
30 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
27
1
0
30 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
27
9
0
25 Oct 2023
Density of States Prediction of Crystalline Materials via Prompt-guided
  Multi-Modal Transformer
Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer
Namkyeong Lee
Heewoong Noh
Sungwon Kim
Dongmin Hyun
Gyoung S. Na
Chanyoung Park
23
5
0
24 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
147
146
0
16 Oct 2023
Can We Edit Multimodal Large Language Models?
Can We Edit Multimodal Large Language Models?
Siyuan Cheng
Bo Tian
Qingbin Liu
Xi Chen
Yongheng Wang
Huajun Chen
Ningyu Zhang
MLLM
30
28
0
12 Oct 2023
Robust Multimodal Learning with Missing Modalities via
  Parameter-Efficient Adaptation
Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation
Md Kaykobad Reza
Ashley Prater-Bennette
M. Salman Asif
28
5
0
06 Oct 2023
A Survey of GPT-3 Family Large Language Models Including ChatGPT and
  GPT-4
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Katikapalli Subramanyam Kalyan
LM&MA
AI4CE
LRM
AILaw
ELM
34
224
0
04 Oct 2023
Modality-aware Transformer for Financial Time series Forecasting
Modality-aware Transformer for Financial Time series Forecasting
Hajar Emami
Xuan-Hong Dang
Yousaf Shah
Petros Zerfos
AI4TS
34
0
0
02 Oct 2023
Building Flexible, Scalable, and Machine Learning-ready Multimodal
  Oncology Datasets
Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets
Aakash Tripathi
Asim Waqas
Kavya Venkatesan
Yasin Yilmaz
Ghulam Rasool
AI4CE
29
14
0
30 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal
  Transformers
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
29
18
0
28 Sep 2023
RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical
  Flow and Scene Flow Estimation
RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation
Zhexiong Wan
Yuxin Mao
Jing Zhang
Yuchao Dai
3DPC
30
22
0
26 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
31
5
0
23 Sep 2023
RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene
  Parsing
RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing
Jiahang Li
Yikang Zhang
Peng Yun
Guangliang Zhou
Qijun Chen
Rui Fan
ViT
OffRL
18
26
0
19 Sep 2023
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts
  by Multimodal Learning with Graph Neural Network and Language Model
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model
Phan The Duy
Nghi Hoang Khoa
N. H. Quyen
Le Cong Trinh
V. Kiên
Trinh Minh Hoang
V. Pham
14
9
0
15 Sep 2023
Deep evidential fusion with uncertainty quantification and contextual
  discounting for multimodal medical image segmentation
Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation
Ling Huang
S. Ruan
P. Decazes
Thierry Denoeux
EDL
MedIm
25
1
0
12 Sep 2023
A Survey on Interpretable Cross-modal Reasoning
A Survey on Interpretable Cross-modal Reasoning
Dizhan Xue
Shengsheng Qian
Zuyi Zhou
Changsheng Xu
LRM
29
4
0
05 Sep 2023
Learning multi-modal generative models with permutation-invariant
  encoders and tighter variational bounds
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
Marcel Hirt
Domenico Campolo
Victoria Leong
Juan-Pablo Ortega
DRL
10
0
0
01 Sep 2023
Multitask Deep Learning for Accurate Risk Stratification and Prediction
  of Next Steps for Coronary CT Angiography Patients
Multitask Deep Learning for Accurate Risk Stratification and Prediction of Next Steps for Coronary CT Angiography Patients
Juan Lu
Bennamoun
J. Stewart
J. Eshraghian
Yanbin Liu
B. Chow
Frank M. Sanfilippo
Girish Dwivedi
OOD
24
1
0
01 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
62
4
0
28 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
42
3
0
18 Aug 2023
CTP: Towards Vision-Language Continual Pretraining via Compatible
  Momentum Contrast and Topology Preservation
CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
Hongguang Zhu
Yunchao Wei
Xiaodan Liang
Chunjie Zhang
Yao-Min Zhao
VLM
32
28
0
14 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
38
118
0
25 Jul 2023
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Jinxian Liu
Chen Ju
Chaofan Ma
Yanfeng Wang
Yu Wang
Ya-Qin Zhang
VOS
32
23
0
25 Jul 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature
  Alignment
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
46
14
0
24 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
29
18
0
21 Jul 2023
Transformers in Reinforcement Learning: A Survey
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
24
18
0
12 Jul 2023
Transformers in Healthcare: A Survey
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
21
25
0
30 Jun 2023
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling
Zhenyu Zhang
Wenhao Chai
Zhongyu Jiang
Tianbo Ye
Xiuming Zhang
Lei Li
Gaoang Wang
3DH
26
4
0
29 Jun 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjD
VLM
34
136
0
28 Jun 2023
Generate to Understand for Representation
Generate to Understand for Representation
Changshan Xue
Xiande Zhong
Xiaoqing Liu
VLM
40
0
0
14 Jun 2023
Safeguarding Data in Multimodal AI: A Differentially Private Approach to
  CLIP Training
Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training
Alyssa Huang
Peihan Liu
Ryumei Nakada
Linjun Zhang
Wanrong Zhang
VLM
71
5
0
13 Jun 2023
Modality Influence in Multimodal Machine Learning
Modality Influence in Multimodal Machine Learning
Abdelhamid Haouhat
Slimane Bellaouar
A. Nehar
H. Cherroun
28
2
0
10 Jun 2023
Towards Arabic Multimodal Dataset for Sentiment Analysis
Towards Arabic Multimodal Dataset for Sentiment Analysis
Abdelhamid Haouhat
Slimane Bellaouar
A. Nehar
H. Cherroun
11
2
0
10 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
35
22
0
06 Jun 2023
Backchannel Detection and Agreement Estimation from Video with
  Transformer Networks
Backchannel Detection and Agreement Estimation from Video with Transformer Networks
A. Amer
Chirag Bhuvaneshwara
G. Addluri
Mohammed Maqsood Shaik
Vedant Bonde
Philippe Muller
25
5
0
02 Jun 2023
Transformer-based Multi-Modal Learning for Multi Label Remote Sensing
  Image Classification
Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification
David Hoffmann
Kai Norman Clasen
Begüm Demir
16
8
0
02 Jun 2023
Evaluating the Capabilities of Multi-modal Reasoning Models with
  Synthetic Task Data
Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data
Nathan Vaska
Victoria Helus
LRM
12
1
0
01 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Shubin Huang
Qiong Wu
Yiyi Zhou
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLM
VPVLM
LRM
16
0
0
01 Jun 2023
Large language models improve Alzheimer's disease diagnosis using
  multi-modality data
Large language models improve Alzheimer's disease diagnosis using multi-modality data
Yingjie Feng
Jun Wang
Xianfeng Gu
Xiaoyin Xu
M. Zhang
LM&MA
16
10
0
26 May 2023
GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for
  Remote Sensing Data
GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data
Zhitong Xiong
Sining Chen
Yi Wang
Lichao Mou
Xiao Xiang Zhu
32
4
0
24 May 2023
PanoContext-Former: Panoramic Total Scene Understanding with a
  Transformer
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
Yuan Dong
C. Fang
Liefeng Bo
Zilong Dong
Ping Tan
MDE
ViT
17
10
0
21 May 2023
Efficient Multimodal Neural Networks for Trigger-less Voice Assistants
Efficient Multimodal Neural Networks for Trigger-less Voice Assistants
Sai Srujana Buddi
U. Sarawgi
Tashweena Heeramun
Karan Sawnhey
Ed Yanosik
Saravana Rathinam
Saurabh N. Adya
25
5
0
20 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
34
1
0
12 May 2023
Multimodal Understanding Through Correlation Maximization and
  Minimization
Multimodal Understanding Through Correlation Maximization and Minimization
Yi Shi
Marc Niethammer
33
0
0
04 May 2023
Previous
123456
Next