Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.09554
Cited By
Referring Expression Comprehension: A Survey of Methods and Datasets
19 July 2020
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Referring Expression Comprehension: A Survey of Methods and Datasets"
50 / 55 papers shown
Title
Human-like compositional learning of visually-grounded concepts using synthetic environments
Zijun Lin
M Ganesh Kumar
Cheston Tan
OCL
CoGe
70
0
0
09 Apr 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
C. L. P. Chen
Zonghao Guo
Derek F. Wong
Xiaoyi Feng
Maosong Sun
VLM
LRM
76
0
0
17 Mar 2025
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
Zilun Zhang
Haozhan Shen
Tiancheng Zhao
Bin Chen
Zian Guan
Yuhao Wang
Xu Jia
Yuxiang Cai
Yongheng Shang
Jianwei Yin
54
0
0
16 Mar 2025
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
148
0
0
11 Mar 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
48
0
0
24 Feb 2025
Accounting for Focus Ambiguity in Visual Questions
Chongyan Chen
Yu-Yun Tseng
Zhuoheng Li
Anush Venkatesh
Danna Gurari
41
0
0
04 Jan 2025
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
55
3
0
31 Dec 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
34
0
0
13 Nov 2024
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Changcheng Xiao
Qiong Cao
Yujie Zhong
Xiang Zhang
Tao Wang
Canqun Yang
L. Lan
23
0
0
17 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
24
5
0
10 Oct 2024
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
23
0
0
18 Sep 2024
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
Jingcheng Ke
Dele Wang
Jun-Cheng Chen
I-Hong Jhuo
Chia-Wen Lin
Yen-Yu Lin
31
0
0
05 Sep 2024
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
Runwei Guan
Jianan Liu
Liye Jia
Haocheng Zhao
Shanliang Yao
Xiaohui Zhu
Ka Lok Man
Eng Gee Lim
Jeremy S. Smith
Yutao Yue
49
5
0
30 Aug 2024
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Xiao-Yang Liu
Tianjie Zhang
Yu Gu
Iat Long Iong
Yifan Xu
...
Zhengxiao Du
Chan Hee Song
Yu Su
Yuxiao Dong
Jie Tang
VLM
LLMAG
47
22
0
12 Aug 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
Jierun Chen
Fangyun Wei
Jinjing Zhao
Sizhe Song
Bohuai Wu
Zhuoxuan Peng
S.-H. Gary Chan
Hongyang R. Zhang
33
8
0
24 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
35
9
0
09 Jun 2024
MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains
Zhaohuan Zhan
Lisha Yu
Sijie Yu
Guang Tan
LLMAG
LM&Ro
51
10
0
17 May 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
34
8
0
20 Apr 2024
Referring Flexible Image Restoration
Runwei Guan
Rongsheng Hu
Zhuhao Zhou
Tianlang Xue
Ka Lok Man
Jeremy S. Smith
Eng Gee Lim
Weiping Ding
Yutao Yue
32
0
0
16 Apr 2024
LocCa: Visual Pretraining with Location-aware Captioners
Bo Wan
Michael Tschannen
Yongqin Xian
Filip Pavetić
Ibrahim M. Alabdulmohsin
Xiao Wang
André Susano Pinto
Andreas Steiner
Lucas Beyer
Xiao-Qi Zhai
VLM
44
6
0
28 Mar 2024
J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution
Nobuhiro Ueda
Hideko Habe
Yoko Matsui
Akishige Yuguchi
Seiya Kawano
Yasutomo Kawanishi
Sadao Kurohashi
Koichiro Yoshino
28
2
0
28 Mar 2024
Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation
Bowen Huang
Yanwei Zheng
Chuanlin Lan
Xinpeng Zhao
Yifei Zou
Dongxiao Yu
36
0
0
23 Mar 2024
MyVLM: Personalizing VLMs for User-Specific Queries
Yuval Alaluf
Elad Richardson
Sergey Tulyakov
Kfir Aberman
Daniel Cohen-Or
MLLM
VLM
38
18
0
21 Mar 2024
VL-Mamba: Exploring State Space Models for Multimodal Learning
Yanyuan Qiao
Zheng Yu
Longteng Guo
Sihan Chen
Zijia Zhao
Mingzhen Sun
Qi Wu
Jing Liu
Mamba
37
65
0
20 Mar 2024
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
Runwei Guan
Liye Jia
Fengyufan Yang
Shanliang Yao
Erick Purwanto
...
Eng Gee Lim
Jeremy S. Smith
Ka Lok Man
Xuming Hu
Yutao Yue
34
9
0
19 Mar 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
32
3
0
19 Feb 2024
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
37
83
0
06 Dec 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Georgios Tziafas
Yucheng Xu
Arushi Goel
M. Kasaei
Zhibin Li
H. Kasaei
32
23
0
09 Nov 2023
Toloka Visual Question Answering Benchmark
Mert Pilanci
Nikita Pavlichenko
Sergey Koshelev
Daniil Likhobaba
Alisa Smirnova
27
4
0
28 Sep 2023
Dense Object Grounding in 3D Scenes
Wencan Huang
Daizong Liu
Wei Hu
13
17
0
05 Sep 2023
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
Ziyan Yang
Kushal Kafle
Zhe-nan Lin
Scott D. Cohen
Zhihong Ding
Vicente Ordonez
23
1
0
24 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes
Yuhao Lu
Yixuan Fan
Beixing Deng
F. Liu
Yali Li
Shengjin Wang
33
28
0
01 Aug 2023
Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions
Yui Iioka
Y. Yoshida
Yuiga Wada
Shumpei Hatanaka
K. Sugiura
DiffM
42
5
0
17 Jul 2023
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Ming Yan
Yaowei Wang
Changsheng Xu
ObjD
VLM
31
30
0
15 May 2023
Natural Language Robot Programming: NLP integrated with autonomous robotic grasping
Muhammad Arshad Khan
Max Kenney
Jack Painter
Disha Kamale
R. Batista-Navarro
Amir M. Ghalamzan-E.
LM&Ro
8
4
0
06 Apr 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding
Ziyang Lu
Yunqiang Pei
Guoqing Wang
Yang Yang
Zheng Wang
Heng Tao Shen
46
6
0
23 Mar 2023
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
23
26
0
01 Feb 2023
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Haoxuan You
Rui Sun
Zhecan Wang
Kai-Wei Chang
Shih-Fu Chang
14
4
0
14 Dec 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
22
4
0
23 Oct 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
Gen Luo
Yiyi Zhou
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
18
10
0
17 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
11
17
0
31 Mar 2022
To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo
Yiran Luo
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
16
4
0
30 Mar 2022
Interactive Robotic Grasping with Attribute-Guided Disambiguation
Yang Yang
Xibai Lou
Changhyun Choi
16
30
0
15 Mar 2022
Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding
Yang Jiao
Zequn Jie
Jingjing Chen
Lin Ma
Yu-Gang Jiang
OOD
15
7
0
10 Mar 2022
Towards Automated Error Analysis: Learning to Characterize Errors
Tong Gao
Shivang Singh
Raymond J. Mooney
6
1
0
13 Jan 2022
CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings
Gabriel Skantze
Bram Willemsen
VLM
6
13
0
15 Nov 2021
Audio-Visual Grounding Referring Expression for Robotic Manipulation
Yefei Wang
Kaili Wang
Yi Wang
Di Guo
Huaping Liu
F. Sun
32
12
0
22 Sep 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
21
63
0
25 Aug 2021
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
18
21
0
16 Aug 2021
1
2
Next