ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.13165
  4. Cited By
Multimodal Large Language Models: A Survey

Multimodal Large Language Models: A Survey

22 November 2023
Jiayang Wu
Wensheng Gan
Zefeng Chen
Shicheng Wan
Philip S. Yu
ArXivPDFHTML

Papers citing "Multimodal Large Language Models: A Survey"

32 / 32 papers shown
Title
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
Zhikai Wang
Jiashuo Sun
Weinan Zhang
Zhiqiang Hu
Xin Li
F. Wang
Deli Zhao
VLM
LRM
75
0
0
24 Apr 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
R. Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
75
0
0
15 Apr 2025
Feature-Aware Malicious Output Detection and Mitigation
Feature-Aware Malicious Output Detection and Mitigation
Weilong Dong
Peiguang Li
Yu Tian
Xinyi Zeng
Fengdi Li
Sirui Wang
AAML
24
0
0
12 Apr 2025
Enforcement Agents: Enhancing Accountability and Resilience in Multi-Agent AI Frameworks
Enforcement Agents: Enhancing Accountability and Resilience in Multi-Agent AI Frameworks
Sagar Tamang
Dibya Jyoti Bora
24
0
0
05 Apr 2025
Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks
Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks
Yongyi Zang
Sean O'Brien
Taylor Berg-Kirkpatrick
Julian McAuley
Zachary Novack
AuLLM
92
1
0
01 Apr 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
46
0
0
12 Mar 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
P. Wang
Zhongzhi Li
Fei Yin
Dekang Ran
Chenglin Liu
Cheng-Lin Liu
LRM
50
3
0
28 Feb 2025
Visual RAG: Expanding MLLM visual knowledge without fine-tuning
Visual RAG: Expanding MLLM visual knowledge without fine-tuning
Mirco Bonomo
Simone Bianco
VLM
73
5
0
18 Jan 2025
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen
Chengyu Wang
Dakan Wang
Taolin Zhang
Wangyue Li
Xiaofeng He
KELM
80
1
0
23 Nov 2024
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Yangning Li
Yinghui Li
Xinyu Wang
Yong-feng Jiang
Zhen Zhang
...
Hui Wang
Hai-Tao Zheng
Pengjun Xie
Philip S. Yu
Fei Huang
65
15
0
05 Nov 2024
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey
Dianzhi Yu
Xinni Zhang
Yankai Chen
Aiwei Liu
Yifei Zhang
Philip S. Yu
Irwin King
VLM
CLL
44
9
0
07 Oct 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in
  the Era of Large Language Models
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Jinliang Lu
Ziliang Pang
Min Xiao
Yaochen Zhu
Rui Xia
Jiajun Zhang
MoMe
46
18
0
08 Jul 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
56
1
0
04 Jul 2024
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
Hao Liang
Jiapeng Li
Tianyi Bai
Xijie Huang
Linzhuang Sun
Zhengren Wang
Conghui He
Bin Cui
Chong Chen
Wentao Zhang
VGen
29
7
0
03 Jul 2024
Visual Reasoning and Multi-Agent Approach in Multimodal Large Language
  Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges
Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges
Mohammed Elhenawy
Ahmad Abutahoun
Taqwa I. Alhadidi
Ahmed Jaber
Huthaifa I. Ashqar
Shadi Jaradat
Ahmed Abdelhay
Sébastien Glaser
A. Rakotonirainy
LLMAG
LRM
31
12
0
26 Jun 2024
Cracking the Code of Juxtaposition: Can AI Models Understand the
  Humorous Contradictions
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Zhe Hu
Tuo Liang
Jing Li
Yiren Lu
Yunlai Zhou
Yiran Qiao
Jing Ma
Yu Yin
49
4
0
29 May 2024
LLMs and the Future of Chip Design: Unveiling Security Risks and
  Building Trust
LLMs and the Future of Chip Design: Unveiling Security Risks and Building Trust
Zeng Wang
Lilas Alrahis
Likhitha Mankali
J. Knechtel
Ozgur Sinanoglu
43
9
0
11 May 2024
Supporting Business Document Workflows via Collection-Centric
  Information Foraging with Large Language Models
Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models
Raymond Fok
Nedim Lipka
Tong Sun
Alexa F. Siu
28
6
0
02 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
95
139
0
29 Apr 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
57
9
0
25 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
49
8
0
29 Feb 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM
  Context Fusion
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
Ziyue Wang
Chi Chen
Yiqi Zhu
Fuwen Luo
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Maosong Sun
Yang Liu
40
5
0
19 Feb 2024
Large Language Models in Education: Vision and Opportunities
Large Language Models in Education: Vision and Opportunities
Wensheng Gan
Zhenlian Qi
Jiayang Wu
Chun-Wei Lin
AI4Ed
41
70
0
22 Nov 2023
Large Language Models for Robotics: A Survey
Large Language Models for Robotics: A Survey
Fanlong Zeng
Wensheng Gan
Yongheng Wang
Ning Liu
Philip S. Yu
LM&Ro
124
125
0
13 Nov 2023
Model-as-a-Service (MaaS): A Survey
Model-as-a-Service (MaaS): A Survey
Wensheng Gan
Shicheng Wan
Philip S. Yu
23
21
0
10 Nov 2023
Retrieval-based Knowledge Augmented Vision Language Pre-training
Retrieval-based Knowledge Augmented Vision Language Pre-training
Jiahua Rao
Zifei Shan
Long Liu
Yao Zhou
Yuedong Yang
VLM
88
13
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,244
0
30 Jan 2023
Visual Concepts Tokenization
Visual Concepts Tokenization
Tao Yang
Yuwang Wang
Yan Lu
Nanning Zheng
OCL
ViT
38
12
0
20 May 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
259
558
0
28 Sep 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
174
402
0
10 Sep 2021
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
242
31,257
0
16 Jan 2013
1