ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.09554
  4. Cited By
Referring Expression Comprehension: A Survey of Methods and Datasets

Referring Expression Comprehension: A Survey of Methods and Datasets

19 July 2020
Yanyuan Qiao
Chaorui Deng
Qi Wu
    ObjD
ArXivPDFHTML

Papers citing "Referring Expression Comprehension: A Survey of Methods and Datasets"

50 / 55 papers shown
Title
Human-like compositional learning of visually-grounded concepts using synthetic environments
Human-like compositional learning of visually-grounded concepts using synthetic environments
Zijun Lin
M Ganesh Kumar
Cheston Tan
OCL
CoGe
70
0
0
09 Apr 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
C. L. P. Chen
Zonghao Guo
Derek F. Wong
Xiaoyi Feng
Maosong Sun
VLM
LRM
76
0
0
17 Mar 2025
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
Zilun Zhang
Haozhan Shen
Tiancheng Zhao
Bin Chen
Zian Guan
Yuhao Wang
Xu Jia
Yuxiang Cai
Yongheng Shang
Jianwei Yin
54
0
0
16 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
148
0
0
11 Mar 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
48
0
0
24 Feb 2025
Accounting for Focus Ambiguity in Visual Questions
Chongyan Chen
Yu-Yun Tseng
Zhuoheng Li
Anush Venkatesh
Danna Gurari
41
0
0
04 Jan 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
55
3
0
31 Dec 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference
  Understanding
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
34
0
0
13 Nov 2024
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object
  Tracking and Segmentation
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Changcheng Xiao
Qiong Cao
Yujie Zhong
Xiang Zhang
Tao Wang
Canqun Yang
L. Lan
23
0
0
17 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
24
5
0
10 Oct 2024
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
23
0
0
18 Sep 2024
Make Graph-based Referring Expression Comprehension Great Again through
  Expression-guided Dynamic Gating and Regression
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
Jingcheng Ke
Dele Wang
Jun-Cheng Chen
I-Hong Jhuo
Chia-Wen Lin
Yen-Yu Lin
31
0
0
05 Sep 2024
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
Runwei Guan
Jianan Liu
Liye Jia
Haocheng Zhao
Shanliang Yao
Xiaohui Zhu
Ka Lok Man
Eng Gee Lim
Jeremy S. Smith
Yutao Yue
49
5
0
30 Aug 2024
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation
  Agents
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Xiao-Yang Liu
Tianjie Zhang
Yu Gu
Iat Long Iong
Yifan Xu
...
Zhengxiao Du
Chan Hee Song
Yu Su
Yuxiao Dong
Jie Tang
VLM
LLMAG
47
22
0
12 Aug 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of
  Large Multimodal Models
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
Jierun Chen
Fangyun Wei
Jinjing Zhao
Sizhe Song
Bohuai Wu
Zhuoxuan Peng
S.-H. Gary Chan
Hongyang R. Zhang
33
8
0
24 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances,
  and Future Directions
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
35
9
0
09 Jun 2024
MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and
  Reasoning Chains
MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains
Zhaohuan Zhan
Lisha Yu
Sijie Yu
Guang Tan
LLMAG
LM&Ro
51
10
0
17 May 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual
  Grounding
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
34
8
0
20 Apr 2024
Referring Flexible Image Restoration
Referring Flexible Image Restoration
Runwei Guan
Rongsheng Hu
Zhuhao Zhou
Tianlang Xue
Ka Lok Man
Jeremy S. Smith
Eng Gee Lim
Weiping Ding
Yutao Yue
32
0
0
16 Apr 2024
LocCa: Visual Pretraining with Location-aware Captioners
LocCa: Visual Pretraining with Location-aware Captioners
Bo Wan
Michael Tschannen
Yongqin Xian
Filip Pavetić
Ibrahim M. Alabdulmohsin
Xiao Wang
André Susano Pinto
Andreas Steiner
Lucas Beyer
Xiao-Qi Zhai
VLM
44
6
0
28 Mar 2024
J-CRe3: A Japanese Conversation Dataset for Real-world Reference
  Resolution
J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution
Nobuhiro Ueda
Hideko Habe
Yoko Matsui
Akishige Yuguchi
Seiya Kawano
Yasutomo Kawanishi
Sadao Kurohashi
Koichiro Yoshino
28
2
0
28 Mar 2024
Temporal-Spatial Object Relations Modeling for Vision-and-Language
  Navigation
Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation
Bowen Huang
Yanwei Zheng
Chuanlin Lan
Xinpeng Zhao
Yifei Zou
Dongxiao Yu
36
0
0
23 Mar 2024
MyVLM: Personalizing VLMs for User-Specific Queries
MyVLM: Personalizing VLMs for User-Specific Queries
Yuval Alaluf
Elad Richardson
Sergey Tulyakov
Kfir Aberman
Daniel Cohen-Or
MLLM
VLM
38
18
0
21 Mar 2024
VL-Mamba: Exploring State Space Models for Multimodal Learning
VL-Mamba: Exploring State Space Models for Multimodal Learning
Yanyuan Qiao
Zheng Yu
Longteng Guo
Sihan Chen
Zijia Zhao
Mingzhen Sun
Qi Wu
Jing Liu
Mamba
37
65
0
20 Mar 2024
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and
  mmWave Radar
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
Runwei Guan
Liye Jia
Fengyufan Yang
Shanliang Yao
Erick Purwanto
...
Eng Gee Lim
Jeremy S. Smith
Ka Lok Man
Xuming Hu
Yutao Yue
34
9
0
19 Mar 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
32
3
0
19 Feb 2024
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
37
83
0
06 Dec 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
  Clutter
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Georgios Tziafas
Yucheng Xu
Arushi Goel
M. Kasaei
Zhibin Li
H. Kasaei
32
23
0
09 Nov 2023
Toloka Visual Question Answering Benchmark
Toloka Visual Question Answering Benchmark
Mert Pilanci
Nikita Pavlichenko
Sergey Koshelev
Daniil Likhobaba
Alisa Smirnova
27
4
0
28 Sep 2023
Dense Object Grounding in 3D Scenes
Dense Object Grounding in 3D Scenes
Wencan Huang
Daizong Liu
Wei Hu
13
17
0
05 Sep 2023
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
Ziyan Yang
Kushal Kafle
Zhe-nan Lin
Scott D. Cohen
Zhihong Ding
Vicente Ordonez
23
1
0
24 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects
  in Cluttered Indoor Scenes
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes
Yuhao Lu
Yixuan Fan
Beixing Deng
F. Liu
Yali Li
Shengjin Wang
33
28
0
01 Aug 2023
Multimodal Diffusion Segmentation Model for Object Segmentation from
  Manipulation Instructions
Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions
Yui Iioka
Y. Yoshida
Yuiga Wada
Shumpei Hatanaka
K. Sugiura
DiffM
42
5
0
17 Jul 2023
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Ming Yan
Yaowei Wang
Changsheng Xu
ObjD
VLM
31
30
0
15 May 2023
Natural Language Robot Programming: NLP integrated with autonomous
  robotic grasping
Natural Language Robot Programming: NLP integrated with autonomous robotic grasping
Muhammad Arshad Khan
Max Kenney
Jack Painter
Disha Kamale
R. Batista-Navarro
Amir M. Ghalamzan-E.
LM&Ro
8
4
0
06 Apr 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
  Understanding
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding
Ziyang Lu
Yunqiang Pei
Guoqing Wang
Yang Yang
Zheng Wang
Heng Tao Shen
46
6
0
23 Mar 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
23
26
0
01 Feb 2023
Find Someone Who: Visual Commonsense Understanding in Human-Centric
  Grounding
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Haoxuan You
Rui Sun
Zhecan Wang
Kai-Wei Chang
Shih-Fu Chang
14
4
0
14 Dec 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Extending Phrase Grounding with Pronouns in Visual Dialogues
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
22
4
0
23 Oct 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
  One-Stage Referring Expression Comprehension
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
Gen Luo
Yiyi Zhou
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
18
10
0
17 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
FindIt: Generalized Localization with Natural Language Queries
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
11
17
0
31 Mar 2022
To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo
To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo
Yiran Luo
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
16
4
0
30 Mar 2022
Interactive Robotic Grasping with Attribute-Guided Disambiguation
Interactive Robotic Grasping with Attribute-Guided Disambiguation
Yang Yang
Xibai Lou
Changhyun Choi
16
30
0
15 Mar 2022
Suspected Object Matters: Rethinking Model's Prediction for One-stage
  Visual Grounding
Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding
Yang Jiao
Zequn Jie
Jingjing Chen
Lin Ma
Yu-Gang Jiang
OOD
15
7
0
10 Mar 2022
Towards Automated Error Analysis: Learning to Characterize Errors
Towards Automated Error Analysis: Learning to Characterize Errors
Tong Gao
Shivang Singh
Raymond J. Mooney
6
1
0
13 Jan 2022
CoLLIE: Continual Learning of Language Grounding from Language-Image
  Embeddings
CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings
Gabriel Skantze
Bram Willemsen
VLM
6
13
0
15 Nov 2021
Audio-Visual Grounding Referring Expression for Robotic Manipulation
Audio-Visual Grounding Referring Expression for Robotic Manipulation
Yefei Wang
Kaili Wang
Yi Wang
Di Guo
Huaping Liu
F. Sun
32
12
0
22 Sep 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
21
63
0
25 Aug 2021
Who's Waldo? Linking People Across Text and Images
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
18
21
0
16 Aug 2021
12
Next