ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.13549
  4. Cited By
A Survey on Multimodal Large Language Models
v1v2 (latest)

A Survey on Multimodal Large Language Models

23 June 2023
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
    MLLMLRM
ArXiv (abs)PDFHTMLGithub (15376★)

Papers citing "A Survey on Multimodal Large Language Models"

50 / 112 papers shown
Title
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
Lei Jiang
Zixun Zhang
Zizhou Wang
Xiaobing Sun
Zhen Li
Liangli Zhen
Xiaohua Xu
AAML
24
0
0
20 Jun 2025
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Vishesh Tripathi
Tanmay Odapally
Indraneel Das
Uday Allu
Biddwan Ahmed
VLM
26
0
0
19 Jun 2025
Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems
Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems
Yuqi Ping
Tianhao Liang
Yunpeng Song
Guangyu Lei
Junwei Wu
...
Rui Shao
Chiya Zhang
Weizheng Zhang
Weijie Yuan
Tingting Zhang
30
0
0
15 Jun 2025
Prioritizing Alignment Paradigms over Task-Specific Model Customization in Time-Series LLMs
Prioritizing Alignment Paradigms over Task-Specific Model Customization in Time-Series LLMs
Wei Li
Yunyao Cheng
Xinli Hao
Chaohong Ma
Yuxuan Liang
Bin Yang
Christian S.Jensen
Xiaofeng Meng
AI4TS
44
0
0
13 Jun 2025
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
Tzu-wen Hsu
Ke-Han Lu
Cheng-Han Chiang
Hung-yi Lee
AuLLM
32
0
0
08 Jun 2025
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Zhihao Tang
Chaozhuo Li
Litian Zhang
Xi Zhang
DiffMMedIm
52
9
0
05 Jun 2025
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models
Youze Wang
Wenbo Hu
Yinpeng Dong
Jing Liu
Hanwang Zhang
Richang Hong
71
2
0
02 Jun 2025
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
Yanyuan Qiao
Haodong Hong
Wenqi Lyu
Dong An
Siqi Zhang
Yutong Xie
Xinyu Wang
Qi Wu
LM&Ro
54
0
0
01 Jun 2025
Spoken question answering for visual queries
Spoken question answering for visual queries
Nimrod Shabtay
Zvi Kons
Avihu Dekel
Hagai Aronowitz
R. Hoory
Assaf Arbelle
77
0
0
29 May 2025
Evaluating and Steering Modality Preferences in Multimodal Large Language Model
Evaluating and Steering Modality Preferences in Multimodal Large Language Model
Yu Zhang
Jinlong Ma
Yongshuai Hou
Xuefeng Bai
Kehai Chen
Yang Xiang
Jun Yu
Min Zhang
97
1
0
27 May 2025
LLM-QFL: Distilling Large Language Model for Quantum Federated Learning
LLM-QFL: Distilling Large Language Model for Quantum Federated Learning
Dev Gurung
Shiva Raj Pokhrel
FedML
211
0
0
24 May 2025
Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation
Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation
Xiaozhao Liu
Dinggang Shen
Xihui Liu
86
0
0
21 May 2025
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
161
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
321
1
0
05 May 2025
A Survey of AI Agent Protocols
A Survey of AI Agent Protocols
Yue Yang
Huacan Chai
Yangqiu Song
S. Qi
Muning Wen
...
Gaowei Chang
Wen Liu
Ying Wen
Yong Yu
Weinan Zhang
LLMAG
144
11
0
23 Apr 2025
Mimic In-Context Learning for Multimodal Tasks
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang
Jiale Fu
Chenduo Hao
Xinting Hu
Yingzhe Peng
Xin Geng
Xu Yang
110
0
0
11 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
116
1
0
11 Apr 2025
Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation
Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation
Yongqian Li
Bo Liu
Sheng Huang
Zhe Zhang
Xiaotong Yuan
Richang Hong
147
1
0
31 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELMOffRLLRMAI4CE
429
4
0
26 Mar 2025
LangBridge: Interpreting Image as a Combination of Language Embeddings
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao
Yuwei Niu
Fanqing Meng
Hao Li
Changyao Tian
...
Dianqi Li
X. Zhu
Li Yuan
Jifeng Dai
Yu Cheng
MLLM
152
1
0
25 Mar 2025
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo
Ziyang Chen
Shaoguang Wang
Jianxiang He
Yijie Xu
Jinhui Ye
Ying Sun
Hui Xiong
121
4
0
17 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
Margret Keuper
Hideki Tanaka
Masao Utiyama
Raj Dabre
Steffen Eger
Simone Paolo Ponzetto
209
0
0
14 Mar 2025
Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs
Wenzhuo Xu
Zhipeng Wei
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
Xinming Zhang
AAML
92
0
0
10 Mar 2025
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei
Faizan Siddiqui
Jiacong Xu
Vishal M. Patel
Shao-Yuan Lo
VLM
483
1
0
10 Mar 2025
See What You Are Told: Visual Attention Sink in Large Multimodal Models
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
171
10
0
05 Mar 2025
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
Wei-Yao Wang
Zhao Wang
Helen Suzuki
Yoshiyuki Kobayashi
LRM
108
1
0
04 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
137
3
0
04 Mar 2025
Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models
Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models
Tianjie Ju
Yi Hua
Hao Fei
Zhenyu Shao
Yubin Zheng
Haodong Zhao
Mong Li Lee
Wynne Hsu
Zhuosheng Zhang
Gongshen Liu
148
0
0
03 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
437
0
0
02 Mar 2025
Protein Structure Tokenization: Benchmarking and New Recipe
Protein Structure Tokenization: Benchmarking and New Recipe
Xinyu Yuan
Zichen Wang
Marcus Collins
Huzefa Rangwala
62
1
0
28 Feb 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Dinesh Manocha
MoE
150
0
0
27 Feb 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLMLRM
173
4
0
24 Feb 2025
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Yubo Wang
Jianting Tang
Chaohu Liu
Linli Xu
AAML
192
1
0
23 Feb 2025
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li
Zhelun Shi
Xuhao Hu
Bowen Dong
Yiran Qin
Xihui Liu
Lu Sheng
Jing Shao
150
2
0
21 Feb 2025
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
Yi Fang
Bowen Jin
Jiacheng Shen
Sirui Ding
Qiaoyu Tan
Jiawei Han
198
2
0
17 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
Tanveer Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLMAI4TS
196
3
0
14 Feb 2025
Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
Shiryu Ueno
Yoshikazu Hayashi
Shunsuke Nakatsuka
Yusei Yamada
Hiroaki Aizawa
K. Kato
MLLMVLM
203
0
0
13 Feb 2025
From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine
From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine
Lukas Buess
Matthias Keicher
Nassir Navab
Andreas Maier
Soroosh Tayebi Arasteh
LM&MA
344
2
0
13 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
164
3
0
11 Feb 2025
Large Multimodal Models for Low-Resource Languages: A Survey
Large Multimodal Models for Low-Resource Languages: A Survey
Marian Lupascu
Ana-Cristina Rogoz
Mihai-Sorin Stupariu
Radu Tudor Ionescu
185
2
0
08 Feb 2025
Large Language Models for Multi-Robot Systems: A Survey
Large Language Models for Multi-Robot Systems: A Survey
Peihan Li
Zijian An
Shams Abrar
Lifeng Zhou
LM&RoLRM
136
10
0
06 Feb 2025
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Bin Zhu
Hui yan Qi
Yinxuan Gui
Jingjing Chen
Chong-Wah Ngo
Ee-Peng Lim
449
2
0
31 Jan 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
184
23
0
28 Jan 2025
Addressing Bias in Generative AI: Challenges and Research Opportunities in Information Management
Addressing Bias in Generative AI: Challenges and Research Opportunities in Information Management
Xiahua Wei
Naveen Kumar
Han Zhang
135
8
0
22 Jan 2025
Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models
Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models
Tabinda Aman
Mohammad Nadeem
S. Sohail
Mohammad Anas
Min Zhang
VLM
161
1
0
21 Jan 2025
Visual RAG: Expanding MLLM visual knowledge without fine-tuning
Visual RAG: Expanding MLLM visual knowledge without fine-tuning
Mirco Bonomo
Simone Bianco
VLM
111
5
0
18 Jan 2025
Large language models for automated scholarly paper review: A survey
Large language models for automated scholarly paper review: A survey
Zhenzhen Zhuang
Jiandong Chen
Hongfeng Xu
Yuwen Jiang
Jialiang Lin
111
6
0
17 Jan 2025
Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models
Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models
Abdulkadir Erol
Trilok Padhi
Agnik Saha
Ugur Kursuncu
Mehmet Emin Aktas
99
2
0
17 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
191
3
0
10 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
179
15
0
06 Jan 2025
123
Next