Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.13549
Cited By
A Survey on Multimodal Large Language Models
23 June 2023
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Survey on Multimodal Large Language Models"
50 / 117 papers shown
Title
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
F. Khan
Jun Chen
Youssef Mohamed
Chun-Mei Feng
Mohamed Elhoseiny
VLM
31
0
0
08 May 2025
Fine-Tuning Video-Text Contrastive Model for Primate Behavior Retrieval from Unlabeled Raw Videos
Giulio Cesare Mastrocinque Santo
Patrícia Izar
Irene Delval
Victor de Napole Gregolin
Nina S. T. Hirata
VGen
35
0
0
08 May 2025
CM1 - A Dataset for Evaluating Few-Shot Information Extraction with Large Vision Language Models
Fabian Wolf
Oliver Tüselmann
Arthur Matei
Lukas Hennies
Christoph Rass
Gernot A. Fink
48
0
0
07 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
62
0
0
05 May 2025
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
53
0
0
05 May 2025
DualDiff: Dual-branch Diffusion Model for Autonomous Driving with Semantic Fusion
Haoteng Li
Zhao Yang
Zezhong Qian
Gongpeng Zhao
Yuqi Huang
Jun-chen Yu
Huazheng Zhou
Longjun Liu
89
1
0
03 May 2025
Simple Visual Artifact Detection in Sora-Generated Videos
Misora Sugiyama
Hirokatsu Kataoka
EGVM
59
0
0
30 Apr 2025
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis
Daisuke Niizumi
Daiki Takeuchi
Masahiro Yasuda
Binh Thien Nguyen
Yasunori Ohishi
N. Harada
27
0
0
25 Apr 2025
A Survey of AI Agent Protocols
Y. Yang
Huacan Chai
Y. Song
S. Qi
Muning Wen
...
Gaowei Chang
W. Liu
Ying Wen
Yong Yu
W. Zhang
LLMAG
64
1
0
23 Apr 2025
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Hanlei Zhang
Zhuohang Li
Yeshuang Zhu
Hua Xu
Peiwu Wang
Haige Zhu
Jie Zhou
Jinchao Zhang
27
0
0
23 Apr 2025
DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging
Tianhui Song
Weixin Feng
Shuai Wang
X. Li
Tiezheng Ge
Bo Zheng
Limin Wang
MoMe
57
0
0
16 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
30
0
0
11 Apr 2025
Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation
Y. Li
Bo Liu
Sheng Huang
Z. Zhang
Xiaotong Yuan
Richang Hong
41
0
0
31 Mar 2025
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo
Fan Ma
Linchao Zhu
T. Wang
Fengyun Rao
Yi Yang
LRM
72
0
0
26 Mar 2025
GOAL: Global-local Object Alignment Learning
Hyungyu Choi
Young Kyun Jang
Chanho Eom
VLM
112
0
0
22 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
M. Keuper
Hideki Tanaka
Masao Utiyama
Raj Dabre
Steffen Eger
Simone Paolo Ponzetto
50
0
0
14 Mar 2025
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei
Faizan Siddiqui
Jiacong Xu
Vishal M. Patel
Shao-Yuan Lo
VLM
137
0
0
10 Mar 2025
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Kaixin Cai
...
Yu-Jie Yuan
J. Han
Lanqing Hong
Hang Xu
Xiaodan Liang
ReLM
LRM
54
6
0
08 Mar 2025
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
Wei-Yao Wang
Zhao Wang
Helen Suzuki
Yoshiyuki Kobayashi
LRM
50
1
0
04 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
111
0
0
02 Mar 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Tianyi Zhou
MoE
46
0
0
27 Feb 2025
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Yubo Wang
Jianting Tang
Chaohu Liu
Linli Xu
AAML
53
1
0
23 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
T. Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLM
AI4TS
109
0
0
14 Feb 2025
Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
Shiryu Ueno
Yoshikazu Hayashi
Shunsuke Nakatsuka
Yusei Yamada
Hiroaki Aizawa
K. Kato
MLLM
VLM
89
0
0
13 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
62
3
0
11 Feb 2025
Large Multimodal Models for Low-Resource Languages: A Survey
Marian Lupascu
Ana-Cristina Rogoz
Mihai-Sorin Stupariu
Radu Tudor Ionescu
56
1
0
08 Feb 2025
Large Language Models for Multi-Robot Systems: A Survey
Peihan Li
Zijian An
Shams Abrar
Lifeng Zhou
LM&Ro
LRM
54
6
0
06 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
68
8
0
04 Feb 2025
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Bin Zhu
Hui yan Qi
Yinxuan Gui
Jingjing Chen
Chong-Wah Ngo
Ee-Peng Lim
101
1
0
31 Jan 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
70
10
0
28 Jan 2025
Addressing Bias in Generative AI: Challenges and Research Opportunities in Information Management
Xiahua Wei
Naveen Kumar
Han Zhang
61
3
0
22 Jan 2025
Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models
Tabinda Aman
Mohammad Nadeem
S. Sohail
Mohammad Anas
Erik Cambria
VLM
59
1
0
21 Jan 2025
Visual RAG: Expanding MLLM visual knowledge without fine-tuning
Mirco Bonomo
Simone Bianco
VLM
63
5
0
18 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Yitong Zhou
Mingyue Cheng
Qingyang Mao
Qi Liu
F. Xu
Xin Li
Enhong Chen
LMTD
37
0
0
30 Dec 2024
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
M. Zhang
114
7
0
22 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
94
5
0
05 Dec 2024
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang
Junkai Chen
Beier Zhu
Tingjin Luo
Yankun Shen
Xu Yang
98
4
0
23 Nov 2024
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
D. Song
Sicheng Lai
Shunian Chen
Lichao Sun
Benyou Wang
116
0
0
06 Nov 2024
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Yangning Li
Yinghui Li
Xinyu Wang
Yong-feng Jiang
Zhen Zhang
...
Hui Wang
Hai-Tao Zheng
Pengjun Xie
Philip S. Yu
Fei Huang
62
15
0
05 Nov 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
23
0
0
26 Oct 2024
SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition
Zechen Li
Shohreh Deldari
Linyao Chen
Hao Xue
Flora D. Salim
34
6
0
14 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
61
4
0
10 Oct 2024
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Muhammad Jehanzeb Mirza
Mengjie Zhao
Zhuoyuan Mao
Sivan Doveh
Wei Lin
...
Yuki Mitsufuji
Horst Possegger
Rogerio Feris
Leonid Karlinsky
James Glass
VLM
76
1
0
08 Oct 2024
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Xin Zou
Yizhou Wang
Yibo Yan
Yuanhuiyi Lyu
Kening Zheng
...
Junkai Chen
Peijie Jiang
J. Liu
Chang Tang
Xuming Hu
81
7
0
04 Oct 2024
SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack
Zihao Pan
Weibin Wu
Yuhang Cao
Zibin Zheng
DiffM
AAML
55
1
0
03 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
20
0
0
03 Oct 2024
Multidimensional Human Activity Recognition With Large Language Model: A Conceptual Framework
Syed Mhamudul Hasan
24
0
0
16 Sep 2024
DrLLM: Prompt-Enhanced Distributed Denial-of-Service Resistance Method with Large Language Models
Zhenyu Yin
Shang Liu
Guangyuan Xu
35
0
0
11 Sep 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
25
53
0
28 Aug 2024
1
2
3
Next