Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1801.01582
Cited By
Object Referring in Videos with Language and Human Gaze
4 January 2018
A. Vasudevan
Dengxin Dai
Luc Van Gool
VOS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Object Referring in Videos with Language and Human Gaze"
19 / 19 papers shown
Title
ChatBEV: A Visual Language Model that Understands BEV Maps
Qingyao Xu
Tian Jin
Guang Chen
Yanfeng Wang
Yujie Zhang
51
0
0
18 Mar 2025
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Rasoul Shafipour
David Harrison
Maxwell Horton
Jeffrey Marker
Houman Bedayat
Sachin Mehta
Mohammad Rastegari
Mahyar Najibi
Saman Naderiparizi
MQ
57
0
0
14 Oct 2024
Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation
Swati Jindal
Mohit Yadav
Roberto Manduchi
37
5
0
08 Apr 2024
Multi-Modal Gaze Following in Conversational Scenarios
Yuqi Hou
Zhongqun Zhang
Nora Horanyi
Jaewon Moon
Yihua Cheng
Hyung Jin Chang
21
5
0
09 Nov 2023
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving
Tushar Choudhary
Vikrant Dewangan
Shivam Chandhok
Shubham Priyadarshan
Anushka Jain
A. K. Singh
Siddharth Srivastava
Krishna Murthy Jatavallabhula
K. M. Krishna
50
59
0
03 Oct 2023
Language Prompt for Autonomous Driving
Dongming Wu
Wencheng Han
Tiancai Wang
Yingfei Liu
Cheng-zhong Xu
Jianbing Shen
Jianbing Shen
VLM
44
73
0
08 Sep 2023
Referring Multi-Object Tracking
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
40
71
0
06 Mar 2023
TubeDETR: Spatio-Temporal Video Grounding with Transformers
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
30
94
0
30 Mar 2022
End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding
Meng Li
Tianbao Wang
Haoyu Zhang
Shengyu Zhang
Zhou Zhao
...
Wenming Tan
Jin Wang
Peng Wang
Shi Pu
Fei Wu
21
45
0
15 Mar 2022
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
30
15
0
08 Jun 2021
Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze
Ece Takmaz
Sandro Pezzelle
Lisa Beinborn
Raquel Fernández
35
22
0
09 Nov 2020
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Yu Liu
Luc Van Gool
Matthew Blaschko
Tinne Tuytelaars
Marie-Francine Moens
30
6
0
18 Sep 2020
Visual Relation Grounding in Videos
Junbin Xiao
Xindi Shang
Xun Yang
Sheng Tang
Tat-Seng Chua
20
40
0
17 Jul 2020
Talk2Car: Taking Control of Your Self-Driving Car
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Luc Van Gool
Marie-Francine Moens
LM&Ro
28
124
0
24 Sep 2019
Searching for Ambiguous Objects in Videos using Relational Referring Expressions
Hazan Anayurt
Sezai Artun Ozyegin
Ulfet Cetin
Utku Aktaş
Sinan Kalkan
19
9
0
03 Aug 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
20
132
0
22 Jul 2019
Learning Accurate, Comfortable and Human-like Driving
Simon Hecker
Dengxin Dai
Luc Van Gool
27
29
0
26 Mar 2019
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
36
617
0
05 Sep 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,464
0
06 Jun 2016
1