ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17685
  4. Cited By

FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

23 May 2025
Shuang Zeng
Xinyuan Chang
Mengwei Xie
Xinran Liu
Yifan Bai
Zheng Pan
Mu Xu
Xing Wei
    LRM
ArXivPDFHTML

Papers citing "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving"

50 / 59 papers shown
Title
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
Zhijie Qiao
Haowei Li
Zhong Cao
Henry X. Liu
VLM
101
11
0
01 May 2025
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Qingqing Zhao
Yao Lu
Moo Jin Kim
Zipeng Fu
Zhuoyang Zhang
...
Ankur Handa
Xuan Li
Donglai Xiang
Gordon Wetzstein
Nayeon Lee
LM&Ro
LRM
61
21
0
27 Mar 2025
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Fahad Shahbaz Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
69
1
0
18 Mar 2025
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni
Yuxin Guo
Yichen Liu
Rui Chen
Lewei Lu
Z. Wu
DiffM
VGen
98
4
0
17 Feb 2025
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
Xiang Li
Pengfei Li
Yupeng Zheng
Wei Sun
Yan Wang
Yilun Chen
3DPC
97
2
0
11 Feb 2025
Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models
Tianshuo Xu
Hao Lu
Xu Yan
Yingjie Cai
Bingbing Liu
Yingcong Chen
38
3
0
10 Feb 2025
DriveLM: Driving with Graph Visual Question Answering
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima
Katrin Renz
Kashyap Chitta
Lawrence Yunliang Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
118
182
0
17 Jan 2025
DrivingGPT: Unifying Driving World Modeling and Planning with
  Multi-modal Autoregressive Transformers
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen
Yuqi Wang
Zhaoxiang Zhang
315
8
0
24 Dec 2024
MetaMorph: Multimodal Understanding and Generation via Instruction
  Tuning
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong
David Fan
Jiachen Zhu
Yunyang Xiong
Xinlei Chen
Koustuv Sinha
Michael G. Rabbat
Yann LeCun
Saining Xie
Zhuang Liu
VLM
80
33
0
18 Dec 2024
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained
  Ego-Motion, Object Dynamics, and Scene Composition Control
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
Mariam Hassan
Sebastian Stapf
Ahmad Rahimi
Pedro M B Rezende
Yasaman Haghighi
...
Mathieu Salzmann
Davide Scaramuzza
Marc Pollefeys
Paolo Favaro
Alexandre Alahi
VLM
VGen
110
7
0
15 Dec 2024
Doe-1: Closed-Loop Autonomous Driving with Large World Model
Doe-1: Closed-Loop Autonomous Driving with Large World Model
Wenzhao Zheng
Zetian Xia
Yuanhui Huang
Sicheng Zuo
Jie Zhou
Jiwen Lu
OffRL
3DV
VLM
80
4
0
12 Dec 2024
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
Chunwei Wang
Guansong Lu
Junwei Yang
Runhui Huang
Jiawei Han
Lu Hou
Wei Zhang
Hang Xu
MLLM
73
11
0
09 Dec 2024
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via
  Online Restoration
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
Chaojun Ni
Guosheng Zhao
Xiaofeng Wang
Zheng Hua Zhu
Wenkang Qin
...
Kun Zhan
Peng Jia
Xianpeng Lang
Xingang Wang
Wenjun Mei
VGen
258
9
0
29 Nov 2024
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang
Maixuan Xue
Xinran Liu
Zheng Pan
Xing Wei
81
2
0
31 Oct 2024
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
  and Generation
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu
Xiaokang Chen
Z. F. Wu
Yiyang Ma
Xingchao Liu
...
Wen Liu
Zhenda Xie
Xingkai Yu
Chong Ruan
Ping Luo
AI4TS
88
89
0
17 Oct 2024
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving
  Scene Representation
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
Guosheng Zhao
Chaojun Ni
Xiaofeng Wang
Zheng Zhu
Xinming Zhang
...
Xinze Chen
Boyuan Wang
Youyi Zhang
Wenjun Mei
Xingang Wang
VGen
83
29
0
17 Oct 2024
Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous
  Vehicle Mapping
Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous Vehicle Mapping
Shuang Zeng
Xinyuan Chang
Xinran Liu
Zheng Pan
Xing Wei
73
2
0
09 Sep 2024
Making Large Language Models Better Planners with Reasoning-Decision
  Alignment
Making Large Language Models Better Planners with Reasoning-Decision Alignment
Zhijian Huang
Tao Tang
Shaoxiang Chen
Sihao Lin
Zequn Jie
Lin Ma
Guangrun Wang
Xiaodan Liang
81
14
0
25 Aug 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
71
181
0
22 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
90
53
0
05 Aug 2024
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual
  Question Answering for Autonomous Driving
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
Peiru Zheng
Yun Zhao
Zhan Gong
Hong Zhu
Shaohua Wu
MLLM
58
8
0
31 Jul 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
73
253
0
10 Jun 2024
Vista: A Generalizable Driving World Model with High Fidelity and
  Versatile Controllability
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
Shenyuan Gao
Jiazhi Yang
Li Chen
Kashyap Chitta
Yihang Qiu
Andreas Geiger
Jun Zhang
Hongyang Li
78
83
0
27 May 2024
Continuously Learning, Adapting, and Improving: A Dual-Process Approach
  to Autonomous Driving
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
Jianbiao Mei
Yukai Ma
Xuemeng Yang
Licheng Wen
Xinyu Cai
...
Min Dou
Botian Shi
Liang He
Yong-Jin Liu
Yu Qiao
60
13
0
24 May 2024
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for
  Autonomous Vehicles
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang
Chejian Xu
Yue Liu
68
40
0
22 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
96
287
0
16 May 2024
DriveWorld: 4D Pre-trained Scene Understanding via World Models for
  Autonomous Driving
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Chen Min
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Xinli Xu
...
Yulan Guo
Junliang Xing
Liping Jing
Yiming Nie
Bin Dai
VGen
VLM
39
30
0
07 May 2024
Language-Image Models with 3D Understanding
Language-Image Models with 3D Understanding
Jang Hyun Cho
Boris Ivanovic
Yulong Cao
Edward Schmerling
Yue Wang
...
Boyi Li
Yurong You
Philipp Krahenbuhl
Yan Wang
Marco Pavone
LRM
50
19
0
06 May 2024
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Shihao Wang
Zhiding Yu
Xiaohui Jiang
Shiyi Lan
Min Shi
Nadine Chang
Jan Kautz
Ying Li
Jose M. Alvarez
LRM
50
51
0
02 May 2024
Generalized Predictive Model for Autonomous Driving
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang
Shenyuan Gao
Yihang Qiu
Li Chen
Tianyu Li
...
Ping Luo
Jun Zhang
Andreas Geiger
Yu Qiao
Hongyang Li
VGen
73
60
0
14 Mar 2024
DriveVLM: The Convergence of Autonomous Driving and Large
  Vision-Language Models
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
Xiaoyu Tian
Junru Gu
Bailin Li
Yicheng Liu
Yang Wang
Chenxu Hu
Kun Zhan
Peng Jia
Xianpeng Lang
Hang Zhao
VLM
78
138
0
19 Feb 2024
Driving Everywhere with Large Language Model Policy Adaptation
Driving Everywhere with Large Language Model Policy Adaptation
Boyi Li
Yue Wang
Jiageng Mao
Boris Ivanovic
Sushant Veer
Karen Leung
Marco Pavone
53
32
0
08 Feb 2024
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang
Li Chen
Yanan Sun
Hongyang Li
3DPC
54
45
0
29 Dec 2023
Generative Multimodal Models are In-Context Learners
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
LRM
81
258
0
20 Dec 2023
Reason2Drive: Towards Interpretable and Chain-based Reasoning for
  Autonomous Driving
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Ming-Jun Nie
Renyuan Peng
Chunwei Wang
Xinyue Cai
Jianhua Han
Hang Xu
Li Zhang
LRM
55
54
0
06 Dec 2023
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
Zhiqi Li
Zhiding Yu
Shiyi Lan
Jiahan Li
Jan Kautz
Tong Lu
Jose M. Alvarez
49
76
0
05 Dec 2023
Dolphins: Multimodal Language Model for Driving
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
68
54
0
01 Dec 2023
Driving into the Future: Multiview Visual Forecasting and Planning with
  World Model for Autonomous Driving
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Yu-Quan Wang
Jiawei He
Lue Fan
Hongxin Li
Yuntao Chen
Zhaoxiang Zhang
VGen
75
123
0
29 Nov 2023
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Wenzhao Zheng
Weiliang Chen
Yuanhui Huang
Borui Zhang
Yueqi Duan
Jiwen Lu
VGen
71
75
0
27 Nov 2023
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large
  Language Model
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
Zhenhua Xu
Yujia Zhang
Enze Xie
Zhen Zhao
Yong Guo
Kwan-Yee. K. Wong
Zhenguo Li
Hengshuang Zhao
MLLM
39
271
0
02 Oct 2023
GAIA-1: A Generative World Model for Autonomous Driving
GAIA-1: A Generative World Model for Autonomous Driving
Masane Fuchi
Lloyd Russell
Hudson Yeo
Zak Murez
Hiroto Minami
Alex Kendall
Tomohiro Takagi
Gianluca Corrado
VGen
46
227
0
29 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
50
185
0
20 Sep 2023
DriveDreamer: Towards Real-world-driven World Models for Autonomous
  Driving
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
Xiaofeng Wang
Zheng Hua Zhu
Guan Huang
Xinze Chen
Jiagang Zhu
Jiwen Lu
VGen
43
154
0
18 Sep 2023
Planting a SEED of Vision in Large Language Model
Planting a SEED of Vision in Large Language Model
Yuying Ge
Yixiao Ge
Ziyun Zeng
Xintao Wang
Ying Shan
VLM
MLLM
16
94
0
16 Jul 2023
Emu: Generative Pretraining in Multimodality
Emu: Generative Pretraining in Multimodality
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
45
129
0
11 Jul 2023
What does CLIP know about a red circle? Visual prompt engineering for
  VLMs
What does CLIP know about a red circle? Visual prompt engineering for VLMs
Aleksandar Shtedritski
Christian Rupprecht
Andrea Vedaldi
VLM
MLLM
39
148
0
13 Apr 2023
VAD: Vectorized Scene Representation for Efficient Autonomous Driving
VAD: Vectorized Scene Representation for Efficient Autonomous Driving
Bo Jiang
Shaoyu Chen
Qing Xu
Bencheng Liao
Jiajie Chen
Helong Zhou
Qian Zhang
Wenyu Liu
Chang Huang
Xinggang Wang
110
206
0
21 Mar 2023
Planning-oriented Autonomous Driving
Planning-oriented Autonomous Driving
Yi Hu
Jiazhi Yang
Li Chen
Keyu Li
Chonghao Sima
...
Xiaosong Jia
Qiang Liu
Jifeng Dai
Yu Qiao
Hongyang Li
52
613
0
20 Dec 2022
Model-Based Imitation Learning for Urban Driving
Model-Based Imitation Learning for Urban Driving
Anthony Hu
Gianluca Corrado
Nicolas Griffiths
Zak Murez
Corina Gurau
Hudson Yeo
Alex Kendall
R. Cipolla
Jamie Shotton
117
138
0
14 Oct 2022
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
Chuanxia Zheng
L. Vuong
Jianfei Cai
Dinh Q. Phung
MQ
88
74
0
19 Sep 2022
12
Next