Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.12670
Cited By
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
19 May 2025
Lihong Chen
Hossein Hassani
Soodeh Nikan
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning"
17 / 17 papers shown
Title
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima
Katrin Renz
Kashyap Chitta
Lawrence Yunliang Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
203
198
0
17 Jan 2025
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLM
LRM
101
47
0
19 Mar 2024
A Survey for Foundation Models in Autonomous Driving
Haoxiang Gao
Yaqian Li
Kaiwen Long
Ming Yang
Yiqing Shen
VLM
LRM
92
31
0
02 Feb 2024
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao
Yuxuan Hu
Letian Wang
Steven L. Waslander
Yu Liu
Hongsheng Li
ELM
94
132
0
12 Dec 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
92
204
0
03 Oct 2023
End-to-end Autonomous Driving: Challenges and Frontiers
Li Chen
Peng Wu
Kashyap Chitta
Bernhard Jaeger
Andreas Geiger
Hongyang Li
3DV
122
307
0
29 Jun 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
426
4,563
0
30 Jan 2023
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP
Runnan Chen
Youquan Liu
Lingdong Kong
Xinge Zhu
Yuexin Ma
Yikang Li
Yuenan Hou
Yu Qiao
Wenping Wang
CLIP
3DPC
88
147
0
12 Jan 2023
OpenScene: 3D Scene Understanding with Open Vocabularies
Songyou Peng
Kyle Genova
ChiyuMaxJiang
Andrea Tagliasacchi
Marc Pollefeys
Thomas Funkhouser
3DPC
VLM
92
363
0
28 Nov 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
416
3,542
0
29 Apr 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
542
4,360
0
28 Jan 2022
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
99
710
0
08 Dec 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
133
799
0
24 Aug 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
955
29,436
0
26 Feb 2021
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
160
1,666
0
22 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
144
1,955
0
09 Aug 2019
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
295
4,488
0
20 Nov 2014
1