Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.12320
Cited By
A Survey on Multimodal Large Language Models for Autonomous Driving
21 November 2023
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
Kaizhao Liang
Jintai Chen
Juanwu Lu
Zichong Yang
Kuei-Da Liao
Tianren Gao
Erlong Li
Kun Tang
Zhipeng Cao
Tongxi Zhou
Ao Liu
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Survey on Multimodal Large Language Models for Autonomous Driving"
50 / 101 papers shown
Title
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation
Bolin Lai
Miao Liu
Fiona Ryan
James M. Rehg
ViT
81
37
0
08 Aug 2022
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
119
287
0
13 Jul 2022
Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang
F. Xia
Ted Xiao
Harris Chan
Jacky Liang
...
Tomas Jackson
Linda Luu
Sergey Levine
Karol Hausman
Brian Ichter
LLMAG
LM&Ro
LRM
137
922
0
12 Jul 2022
AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
Xu Cao
Xiaoye Li
Liya Ma
Yi Huang
X. Feng
Zening Chen
H. Zeng
Jianguo Cao
ViT
56
21
0
11 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
418
3,610
0
29 Apr 2022
Correcting Robot Plans with Natural Language Feedback
Pratyusha Sharma
Balakumar Sundaralingam
Valts Blukis
Chris Paxton
Tucker Hermans
Antonio Torralba
Jacob Andreas
Dieter Fox
3DV
LM&Ro
81
93
0
11 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
537
6,301
0
05 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
195
1,988
0
04 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
162
588
0
01 Apr 2022
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Sewon Min
Xinxi Lyu
Ari Holtzman
Mikel Artetxe
M. Lewis
Hannaneh Hajishirzi
Luke Zettlemoyer
LLMAG
LRM
191
1,501
0
25 Feb 2022
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
Eric Jang
A. Irpan
Mohi Khansari
Daniel Kappler
F. Ebert
Corey Lynch
Sergey Levine
Chelsea Finn
LM&Ro
263
550
0
04 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
856
9,714
0
28 Jan 2022
CLIPort: What and Where Pathways for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
130
661
0
24 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
254
3,789
0
03 Sep 2021
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
Suraj Nair
E. Mitchell
Kevin Chen
Brian Ichter
Silvio Savarese
Chelsea Finn
LM&Ro
OffRL
119
160
0
02 Sep 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
136
799
0
24 Aug 2021
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
188
789
0
25 Jun 2021
Prevent the Language Model from being Overconfident in Neural Machine Translation
Mengqi Miao
Fandong Meng
Yijin Liu
Xiao-Hua Zhou
Jie Zhou
73
42
0
24 May 2021
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
Sangmin Lee
Hak Gu Kim
Dae Hwi Choi
Hyungil Kim
Yong Man Ro
81
102
0
02 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
420
5,005
0
24 Feb 2021
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
127
422
0
14 Nov 2020
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding
Alexander Ku
Peter Anderson
Roma Patel
Eugene Ie
Jason Baldridge
104
315
0
15 Oct 2020
Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation
Wenhao Ding
Baiming Chen
Yue Liu
Kim Ji Eun
Ding Zhao
AAML
84
104
0
16 Sep 2020
PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards
Prasoon Goyal
S. Niekum
Raymond J. Mooney
LM&Ro
73
54
0
30 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
154
375
0
29 Jun 2020
Language Conditioned Imitation Learning over Unstructured Data
Corey Lynch
P. Sermanet
LM&Ro
84
251
0
15 May 2020
OccuSeg: Occupancy-aware 3D Instance Segmentation
Lei Han
Tian Zheng
Lan Xu
Lu Fang
3DPC
253
261
0
14 Mar 2020
Deep Reinforcement Learning for Autonomous Driving: A Survey
B. R. Kiran
Ibrahim Sobh
V. Talpaert
Patrick Mannion
A. A. Sallab
S. Yogamani
P. Pérez
358
1,693
0
02 Feb 2020
A Survey of Deep Learning Applications to Autonomous Vehicle Control
Sampo Kuutti
Richard Bowden
Yaochu Jin
P. Barber
Saber Fallah
111
520
0
23 Dec 2019
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Pei Sun
Henrik Kretzschmar
Xerxes Dotiwalla
Aurelien Chouard
Vijaysai Patnaik
...
Shuyang Cheng
Yu Zhang
Jonathon Shlens
Zhifeng Chen
Dragomir Anguelov
152
2,907
0
10 Dec 2019
Argoverse: 3D Tracking and Forecasting with Rich Maps
Ming-Fang Chang
John Lambert
Patsorn Sangkloy
Jagjeet Singh
Sławomir Bąk
...
De Wang
Peter Carr
Simon Lucey
Deva Ramanan
James Hays
3DPC
151
1,298
0
06 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
506
20,376
0
23 Oct 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
155
1,967
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
255
3,699
0
06 Aug 2019
Language as an Abstraction for Hierarchical Deep Reinforcement Learning
Yiding Jiang
S. Gu
Kevin Patrick Murphy
Chelsea Finn
OffRL
67
225
0
18 Jun 2019
A Survey of Autonomous Driving: Common Practices and Emerging Technologies
Ekim Yurtsever
Jacob Lambert
Alexander Carballo
K. Takeda
93
1,396
0
12 Jun 2019
A Survey of Reinforcement Learning Informed by Natural Language
Jelena Luketina
Nantas Nardelli
Gregory Farquhar
Jakob N. Foerster
Jacob Andreas
Edward Grefenstette
Shimon Whiteson
Tim Rocktaschel
LM&Ro
KELM
OffRL
LRM
101
282
0
10 Jun 2019
nuScenes: A multimodal dataset for autonomous driving
Holger Caesar
Varun Bankiti
Alex H. Lang
Sourabh Vora
Venice Erin Liong
Qiang Xu
Anush Krishnan
Yuxin Pan
G. Baldan
Oscar Beijbom
3DPC
301
5,790
0
26 Mar 2019
Learning to Drive in a Day
Alex Kendall
Jeffrey Hawke
David Janz
Przemyslaw Mazur
Daniele Reda
John M. Allen
Vinh-Dieu Lam
Alex Bewley
Amar Shah
111
658
0
01 Jul 2018
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
127
906
0
23 May 2017
Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art
J. Janai
Fatma Guney
Aseem Behl
Andreas Geiger
158
797
0
18 Apr 2017
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
C. Qi
Hao Su
Kaichun Mo
Leonidas Guibas
3DH
3DPC
3DV
PINN
500
14,384
0
02 Dec 2016
Volumetric and Multi-View CNNs for Object Classification on 3D Data
C. Qi
Hao Su
Matthias Niessner
Angela Dai
Mengyuan Yan
Leonidas Guibas
3DPC
3DV
247
1,567
0
12 Apr 2016
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
238
5,512
0
03 May 2015
Deep Multimodal Learning for Audio-Visual Speech Recognition
Youssef Mroueh
E. Marcheret
Vaibhava Goel
62
227
0
22 Jan 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
190
1,241
0
20 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
607
12,745
0
11 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
154
5,595
0
07 Dec 2014
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
265
6,042
0
17 Nov 2014
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
450
20,606
0
10 Sep 2014
Previous
1
2
3
Next