ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.12320
  4. Cited By
A Survey on Multimodal Large Language Models for Autonomous Driving

A Survey on Multimodal Large Language Models for Autonomous Driving

21 November 2023
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
Kaizhao Liang
Jintai Chen
Juanwu Lu
Zichong Yang
Kuei-Da Liao
Tianren Gao
Erlong Li
Kun Tang
Zhipeng Cao
Tongxi Zhou
Ao Liu
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
ArXiv (abs)PDFHTML

Papers citing "A Survey on Multimodal Large Language Models for Autonomous Driving"

50 / 101 papers shown
Title
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
Kangan Qian
Sicong Jiang
Yang Zhong
Ziang Luo
Zilin Huang
...
Yifei Hu
Guang Li
Guang Chen
Hao Ye
Lijun Sun
LRM
81
1
0
21 May 2025
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
152
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
293
1
0
05 May 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Weinan Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&RoLRM
177
9
0
27 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRLReLMLRM
200
100
0
20 Mar 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Peijie Wang
Zhong-Zhi Li
Fei Yin
Xin Yang
Dekang Ran
Cheng-Lin Liu
LRM
125
11
0
28 Feb 2025
TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning
TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning
Chengkai Xu
Jiaqi Liu
Shiyu Fang
Jian Sun
Dong Chen
Peng Hang
Jian Sun
206
1
0
21 Feb 2025
Boosting Multimodal Reasoning with Automated Structured Thinking
Boosting Multimodal Reasoning with Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
Jianhua Tao
LRM
210
11
0
04 Feb 2025
SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
Goodarz Mehr
A. Eskandarian
311
2
0
04 Feb 2025
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection
Jiaqing Zhang
Mingxiang Cao
Weiying Xie
Jie Lei
Daixun Li
Wenbo Huang
Yunsong Li
Xue Yang
126
6
0
28 Jan 2025
CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems
CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems
Haichao Liu
Ruoyu Yao
Wenru Liu
Zhenmin Huang
Shaojie Shen
Jun Ma
74
3
0
10 Jan 2025
Large-scale moral machine experiment on large language models
Large-scale moral machine experiment on large language models
Muhammad Shahrul Zaim bin Ahmad
Kazuhiro Takemoto
ELMAILaw
123
3
1
31 Dec 2024
Large Language Model-based Decision-making for COLREGs and the Control of Autonomous Surface Vehicles
Large Language Model-based Decision-making for COLREGs and the Control of Autonomous Surface Vehicles
Klinsmann Agyei
Pouria Sarhadi
W. Naeem
187
0
0
25 Nov 2024
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang
Junkai Chen
Beier Zhu
Tingjin Luo
Yankun Shen
Xu Yang
159
7
0
23 Nov 2024
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
Chenxi Wang
Xiang Chen
N. Zhang
Bozhong Tian
Haoming Xu
Shumin Deng
Ningyu Zhang
MLLMLRM
222
10
0
15 Oct 2024
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou
Dong Yan
Wei Shen
Yuzi Yan
Jian Xie
Junge Zhang
195
28
0
01 Oct 2024
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Mohammad Shahab Sepehri
Zalan Fabian
Maryam Soltanolkotabi
Mahdi Soltanolkotabi
MedIm
127
6
0
23 Sep 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGeVLM
161
2
0
19 Sep 2024
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
Baichuan Zhou
Haote Yang
Dairong Chen
Junyan Ye
Tianyi Bai
Jinhua Yu
Songyang Zhang
Dahua Lin
Conghui He
Weijia Li
VLM
151
7
0
30 Aug 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
140
56
0
09 Jul 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
164
1
0
04 Jul 2024
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension
Runwei Guan
Ruixiao Zhang
Ningwei Ouyang
Tao Huang
Ka Lok Man
...
Ming Xu
Jeremy S. Smith
Eng Gee Lim
Yutao Yue
Hui Xiong
190
10
0
21 May 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
229
12
0
25 Mar 2024
LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization
LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization
Sai Shubodh Puligilla
Mohammad Omama
Husain Zaidi
Udit Singh Parihar
Madhava Krishna
79
14
0
27 Dec 2023
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Yi Yang
Qingwen Zhang
Ci Li
Daniel Simoes Marta
Nazre Batool
John Folkesson
LRM
112
29
0
14 Nov 2023
Advances in Embodied Navigation Using Large Language Models: A Survey
Advances in Embodied Navigation Using Large Language Models: A Survey
Jinzhou Lin
Han Gao
Xuxiang Feng
Rongtao Xu
Changwei Wang
Man Zhang
Li Guo
Shibiao Xu
LM&RoLLMAG
154
10
0
01 Nov 2023
Navigation with Large Language Models: Semantic Guesswork as a Heuristic
  for Planning
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning
Dhruv Shah
Michael Equi
B. Osinski
Fei Xia
Brian Ichter
Sergey Levine
3DVLM&Ro
84
102
0
16 Oct 2023
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
Hao Sha
Yao Mu
Yuxuan Jiang
Li Chen
Chenfeng Xu
Ping Luo
Shengbo Eben Li
Masayoshi Tomizuka
Wei Zhan
Mingyu Ding
256
179
0
04 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable
  Autonomous Driving
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
108
207
0
03 Oct 2023
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large
  Language Models
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
Licheng Wen
Daocheng Fu
Xin Li
Xinyu Cai
Tengyu Ma
Pinlong Cai
Min Dou
Botian Shi
Liang He
Yu Qiao
94
163
0
28 Sep 2023
PIE: Simulating Disease Progression via Progressive Image Editing
PIE: Simulating Disease Progression via Progressive Image Editing
Kaizhao Liang
Xu Cao
Kuei-Da Liao
Tianren Gao
Wenqian Ye
Zhengyu Chen
Jianguo Cao
Tejas Nama
Jimeng Sun
MedImAI4CE
74
5
0
21 Sep 2023
Can you text what is happening? Integrating pre-trained language
  encoders into trajectory prediction models for autonomous driving
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving
Ali Keysan
Andreas Look
Eitan Kosman
Gonca Gürsun
Jörg Wagner
Yu Yao
Barbara Rakitsch
85
31
0
11 Sep 2023
Language Prompt for Autonomous Driving
Language Prompt for Autonomous Driving
Dongming Wu
Wencheng Han
Tiancai Wang
Yingfei Liu
Cheng-zhong Xu
Jianbing Shen
Jianbing Shen
VLM
112
87
0
08 Sep 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D
  Understanding, Generation, and Instruction Following
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
Ziyu Guo
Renrui Zhang
Xiangyang Zhu
Yiwen Tang
Xianzheng Ma
...
Ke Chen
Peng Gao
Xianzhi Li
Hongsheng Li
Pheng-Ann Heng
MLLM
96
144
0
01 Sep 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
127
518
0
12 Jul 2023
The Waymo Open Sim Agents Challenge
The Waymo Open Sim Agents Challenge
Nico Montali
John Lambert
Paul Mougin
Alex Kuefler
Nick Rhinehart
...
Tristan Emrich
Zoey Yang
Shimon Whiteson
Brandyn White
Drago Anguelov
LLMAG
80
54
0
19 May 2023
M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision
  Transformer
M2^22DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer
Yunsheng Ma
Liangqi Yuan
Amr Abdelraouf
Kyungtae Han
Rohit Gupta
Zihao Li
Ziran Wang
134
10
0
13 May 2023
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging
  Face
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
Yongliang Shen
Kaitao Song
Xu Tan
Dongsheng Li
Weiming Lu
Yueting Zhuang
MLLM
132
911
0
30 Mar 2023
Milestones in Autonomous Driving and Intelligent Vehicles: Survey of
  Surveys
Milestones in Autonomous Driving and Intelligent Vehicles: Survey of Surveys
Long Chen
Yuchen Li
Chao Huang
Bai Li
Yang Xing
...
Chen Lv
Jinjun Wang
Dongpu Cao
N. Zheng
Feiyue Wang
112
322
0
30 Mar 2023
V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle
  Cooperative Perception
V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception
Runsheng Xu
Xin Xia
Jinlong Li
Hanzhao Li
Shuo Zhang
...
Xiaoyu Dong
Rui Song
Hongkai Yu
Bolei Zhou
Jiaqi Ma
133
161
0
14 Mar 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
184
4,180
1
10 Feb 2023
THMA: Tencent HD Map AI System for Creating HD Map Annotations
THMA: Tencent HD Map AI System for Creating HD Map Annotations
Kun Tang
Xu Cao
Zhipeng Cao
Tongxi Zhou
Erlong Li
...
Shengtao Zou
Chang-ling Liu
Shuqi Mei
Elena Sizikova
Chao Zheng
59
12
0
14 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
101
45
0
09 Dec 2022
ViTASD: Robust Vision Transformer Baselines for Autism Spectrum Disorder
  Facial Diagnosis
ViTASD: Robust Vision Transformer Baselines for Autism Spectrum Disorder Facial Diagnosis
Xu Cao
Wenqian Ye
Elena Sizikova
Xue Bai
Megan Coffee
H. Zeng
Jianguo Cao
49
17
0
30 Oct 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLMLRM
234
3,165
0
20 Oct 2022
Large-scale Text-to-Image Generation Models for Visual Artists' Creative
  Works
Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works
Hyung-Kwon Ko
Gwanmo Park
Hyeon Jeon
Jaemin Jo
Juho Kim
Jinwook Seo
95
140
0
16 Oct 2022
A New Path: Scaling Vision-and-Language Navigation with Synthetic
  Instructions and Imitation Learning
A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Aishwarya Kamath
Peter Anderson
Su Wang
Jing Yu Koh
Alexander Ku
Austin Waters
Yinfei Yang
Jason Baldridge
Zarana Parekh
LM&Ro
95
48
0
06 Oct 2022
ProgPrompt: Generating Situated Robot Task Plans using Large Language
  Models
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh
Valts Blukis
Arsalan Mousavian
Ankit Goyal
Danfei Xu
Jonathan Tremblay
Dieter Fox
Jesse Thomason
Animesh Garg
LM&RoLLMAG
177
657
0
22 Sep 2022
DRAMA: Joint Risk Localization and Captioning in Driving
DRAMA: Joint Risk Localization and Captioning in Driving
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
162
99
0
22 Sep 2022
ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver
  Distraction Detection
ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection
Yunsheng Ma
Ziran Wang
ViT
115
15
0
19 Sep 2022
123
Next