ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.12320
  4. Cited By
A Survey on Multimodal Large Language Models for Autonomous Driving

A Survey on Multimodal Large Language Models for Autonomous Driving

21 November 2023
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
Kaizhao Liang
Jintai Chen
Juanwu Lu
Zichong Yang
Kuei-Da Liao
Tianren Gao
Erlong Li
Kun Tang
Zhipeng Cao
Tongxi Zhou
Ao Liu
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
ArXiv (abs)PDFHTML

Papers citing "A Survey on Multimodal Large Language Models for Autonomous Driving"

50 / 101 papers shown
Title
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze
  Estimation
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation
Bolin Lai
Miao Liu
Fiona Ryan
James M. Rehg
ViT
81
37
0
08 Aug 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
119
287
0
13 Jul 2022
Inner Monologue: Embodied Reasoning through Planning with Language
  Models
Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang
F. Xia
Ted Xiao
Harris Chan
Jacky Liang
...
Tomas Jackson
Linda Luu
Sergey Levine
Karol Hausman
Brian Ichter
LLMAGLM&RoLRM
137
922
0
12 Jul 2022
AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
Xu Cao
Xiaoye Li
Liya Ma
Yi Huang
X. Feng
Zening Chen
H. Zeng
Jianguo Cao
ViT
56
21
0
11 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,610
0
29 Apr 2022
Correcting Robot Plans with Natural Language Feedback
Correcting Robot Plans with Natural Language Feedback
Pratyusha Sharma
Balakumar Sundaralingam
Valts Blukis
Chris Paxton
Tucker Hermans
Antonio Torralba
Jacob Andreas
Dieter Fox
3DVLM&Ro
81
93
0
11 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
537
6,301
0
05 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
195
1,988
0
04 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLMLRM
162
588
0
01 Apr 2022
Rethinking the Role of Demonstrations: What Makes In-Context Learning
  Work?
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Sewon Min
Xinxi Lyu
Ari Holtzman
Mikel Artetxe
M. Lewis
Hannaneh Hajishirzi
Luke Zettlemoyer
LLMAGLRM
191
1,501
0
25 Feb 2022
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
Eric Jang
A. Irpan
Mohi Khansari
Daniel Kappler
F. Ebert
Corey Lynch
Sergey Levine
Chelsea Finn
LM&Ro
263
550
0
04 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
856
9,714
0
28 Jan 2022
CLIPort: What and Where Pathways for Robotic Manipulation
CLIPort: What and Where Pathways for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
130
661
0
24 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALMUQCV
254
3,789
0
03 Sep 2021
Learning Language-Conditioned Robot Behavior from Offline Data and
  Crowd-Sourced Annotation
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
Suraj Nair
E. Mitchell
Kevin Chen
Brian Ichter
Silvio Savarese
Chelsea Finn
LM&RoOffRL
119
160
0
02 Sep 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
136
799
0
24 Aug 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
188
789
0
25 Jun 2021
Prevent the Language Model from being Overconfident in Neural Machine
  Translation
Prevent the Language Model from being Overconfident in Neural Machine Translation
Mengqi Miao
Fandong Meng
Yijin Liu
Xiao-Hua Zhou
Jie Zhou
73
42
0
24 May 2021
Video Prediction Recalling Long-term Motion Context via Memory Alignment
  Learning
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
Sangmin Lee
Hak Gu Kim
Dae Hwi Choi
Hyungil Kim
Yong Man Ro
81
102
0
02 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
420
5,005
0
24 Feb 2021
ActBERT: Learning Global-Local Video-Text Representations
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
127
422
0
14 Nov 2020
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense
  Spatiotemporal Grounding
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding
Alexander Ku
Peter Anderson
Roma Patel
Eugene Ie
Jason Baldridge
104
315
0
15 Oct 2020
Multimodal Safety-Critical Scenarios Generation for Decision-Making
  Algorithms Evaluation
Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation
Wenhao Ding
Baiming Chen
Yue Liu
Kim Ji Eun
Ding Zhao
AAML
84
104
0
16 Sep 2020
PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping
  Pixels to Rewards
PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards
Prasoon Goyal
S. Niekum
Raymond J. Mooney
LM&Ro
73
54
0
30 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
154
375
0
29 Jun 2020
Language Conditioned Imitation Learning over Unstructured Data
Language Conditioned Imitation Learning over Unstructured Data
Corey Lynch
P. Sermanet
LM&Ro
84
251
0
15 May 2020
OccuSeg: Occupancy-aware 3D Instance Segmentation
OccuSeg: Occupancy-aware 3D Instance Segmentation
Lei Han
Tian Zheng
Lan Xu
Lu Fang
3DPC
253
261
0
14 Mar 2020
Deep Reinforcement Learning for Autonomous Driving: A Survey
Deep Reinforcement Learning for Autonomous Driving: A Survey
B. R. Kiran
Ibrahim Sobh
V. Talpaert
Patrick Mannion
A. A. Sallab
S. Yogamani
P. Pérez
358
1,693
0
02 Feb 2020
A Survey of Deep Learning Applications to Autonomous Vehicle Control
A Survey of Deep Learning Applications to Autonomous Vehicle Control
Sampo Kuutti
Richard Bowden
Yaochu Jin
P. Barber
Saber Fallah
111
520
0
23 Dec 2019
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Pei Sun
Henrik Kretzschmar
Xerxes Dotiwalla
Aurelien Chouard
Vijaysai Patnaik
...
Shuyang Cheng
Yu Zhang
Jonathon Shlens
Zhifeng Chen
Dragomir Anguelov
152
2,907
0
10 Dec 2019
Argoverse: 3D Tracking and Forecasting with Rich Maps
Argoverse: 3D Tracking and Forecasting with Rich Maps
Ming-Fang Chang
John Lambert
Patsorn Sangkloy
Jagjeet Singh
Sławomir Bąk
...
De Wang
Peter Carr
Simon Lucey
Deva Ramanan
James Hays
3DPC
151
1,298
0
06 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
506
20,376
0
23 Oct 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
155
1,967
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSLVLM
255
3,699
0
06 Aug 2019
Language as an Abstraction for Hierarchical Deep Reinforcement Learning
Language as an Abstraction for Hierarchical Deep Reinforcement Learning
Yiding Jiang
S. Gu
Kevin Patrick Murphy
Chelsea Finn
OffRL
67
225
0
18 Jun 2019
A Survey of Autonomous Driving: Common Practices and Emerging
  Technologies
A Survey of Autonomous Driving: Common Practices and Emerging Technologies
Ekim Yurtsever
Jacob Lambert
Alexander Carballo
K. Takeda
93
1,396
0
12 Jun 2019
A Survey of Reinforcement Learning Informed by Natural Language
A Survey of Reinforcement Learning Informed by Natural Language
Jelena Luketina
Nantas Nardelli
Gregory Farquhar
Jakob N. Foerster
Jacob Andreas
Edward Grefenstette
Shimon Whiteson
Tim Rocktaschel
LM&RoKELMOffRLLRM
101
282
0
10 Jun 2019
nuScenes: A multimodal dataset for autonomous driving
nuScenes: A multimodal dataset for autonomous driving
Holger Caesar
Varun Bankiti
Alex H. Lang
Sourabh Vora
Venice Erin Liong
Qiang Xu
Anush Krishnan
Yuxin Pan
G. Baldan
Oscar Beijbom
3DPC
301
5,790
0
26 Mar 2019
Learning to Drive in a Day
Learning to Drive in a Day
Alex Kendall
Jeffrey Hawke
David Janz
Przemyslaw Mazur
Daniele Reda
John M. Allen
Vinh-Dieu Lam
Alex Bewley
Amar Shah
111
658
0
01 Jul 2018
Look, Listen and Learn
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
127
906
0
23 May 2017
Computer Vision for Autonomous Vehicles: Problems, Datasets and State of
  the Art
Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art
J. Janai
Fatma Guney
Aseem Behl
Andreas Geiger
158
797
0
18 Apr 2017
PointNet: Deep Learning on Point Sets for 3D Classification and
  Segmentation
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
C. Qi
Hao Su
Kaichun Mo
Leonidas Guibas
3DH3DPC3DVPINN
500
14,384
0
02 Dec 2016
Volumetric and Multi-View CNNs for Object Classification on 3D Data
Volumetric and Multi-View CNNs for Object Classification on 3D Data
C. Qi
Hao Su
Matthias Niessner
Angela Dai
Mengyuan Yan
Leonidas Guibas
3DPC3DV
247
1,567
0
12 Apr 2016
VQA: Visual Question Answering
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
238
5,512
0
03 May 2015
Deep Multimodal Learning for Audio-Visual Speech Recognition
Deep Multimodal Learning for Audio-Visual Speech Recognition
Youssef Mroueh
E. Marcheret
Vaibhava Goel
62
227
0
22 Jan 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
190
1,241
0
20 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence
  Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
607
12,745
0
11 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
154
5,595
0
07 Dec 2014
Show and Tell: A Neural Image Caption Generator
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
265
6,042
0
17 Nov 2014
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
450
20,606
0
10 Sep 2014
Previous
123
Next