Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.09542
Cited By
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
30 December 2016
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Joint Speaker-Listener-Reinforcer Model for Referring Expressions"
50 / 53 papers shown
Title
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
159
2
0
14 Jan 2025
Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
Chang Liu
Henghui Ding
Yulun Zhang
Xudong Jiang
26
47
0
24 May 2023
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
Zhi Zhang
H. Yannakoudakis
Xiantong Zhen
Ekaterina Shutova
29
2
0
17 Feb 2023
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
Haowei Wang
Jiayi Ji
Yiyi Zhou
Yongjian Wu
Xiaoshuai Sun
33
15
0
09 Jan 2023
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
Eslam Mohamed Bakr
Yasmeen Alsaedy
Mohamed Elhoseiny
3DPC
23
41
0
25 Nov 2022
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation
Shijia Huang
Feng Li
Hao Zhang
Siyi Liu
Lei Zhang
Liwei Wang
30
5
0
15 Nov 2022
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
21
21
0
15 Nov 2022
Instance-Specific Feature Propagation for Referring Segmentation
Chang Liu
Xudong Jiang
Henghui Ding
ISeg
22
55
0
26 Apr 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
28
94
0
30 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
29
74
0
18 Mar 2022
Phrase-Based Affordance Detection via Cyclic Bilateral Interaction
Liangsheng Lu
Wei Zhai
Hongcheng Luo
Yu Kang
Yang Cao
21
19
0
24 Feb 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
18
10
0
18 Jan 2022
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
21
29
0
02 Dec 2021
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
30
15
0
08 Jun 2021
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
Guang Feng
Zhiwei Hu
Lihe Zhang
Huchuan Lu
EgoV
25
168
0
05 May 2021
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
21
16
0
23 Mar 2021
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images
Haolin Liu
Anran Lin
Xiaoguang Han
Lei Yang
Yizhou Yu
Shuguang Cui
27
39
0
14 Mar 2021
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Zongheng Tang
Yue Liao
Si Liu
Guanbin Li
Xiaojie Jin
Hongxu Jiang
Qian Yu
Dong Xu
21
94
0
10 Nov 2020
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts
Ece Takmaz
Mario Giulianelli
Sandro Pezzelle
Arabella J. Sinclair
Raquel Fernández
20
26
0
09 Nov 2020
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Haidong Zhu
Arka Sadhu
Zhao-Heng Zheng
Ram Nevatia
ObjD
19
7
0
05 Nov 2020
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
Míriam Bellver
Carles Ventura
Carina Silberer
Ioannis V. Kazakos
Jordi Torres
Xavier Giró-i-Nieto
VOS
26
32
0
01 Oct 2020
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Long Chen
Wenbo Ma
Jun Xiao
Hanwang Zhang
Shih-Fu Chang
ObjD
17
89
0
03 Sep 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
50
93
0
19 Jul 2020
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Peng Wang
Dongyang Liu
Hui Li
Qi Wu
ObjD
24
19
0
02 Jun 2020
Dynamic Language Binding in Relational Visual Reasoning
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
NAI
26
19
0
30 Apr 2020
Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding
Thierry Deruyttere
Guillem Collell
Marie-Francine Moens
LRM
13
8
0
19 Mar 2020
A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
A. Magassouba
K. Sugiura
Hisashi Kawai
38
9
0
23 Dec 2019
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Dave Zhenyu Chen
Angel X. Chang
Matthias Nießner
3DPC
32
347
0
18 Dec 2019
Grounding-Tracking-Integration
Zhengyuan Yang
T. Kumar
Tianlang Chen
Jinsong Su
Jiebo Luo
27
53
0
13 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
14
360
0
18 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjD
BDL
36
24
0
08 Jul 2019
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
31
171
0
10 May 2019
ShapeGlot: Learning Language for Shape Differentiation
Panos Achlioptas
Judy Fan
Robert D. Hawkins
Noah D. Goodman
Leonidas J. Guibas
36
82
0
08 May 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
31
227
0
25 Apr 2019
Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments
Fethiye Irmak Dogan
Sinan Kalkan
Iolanda Leite
18
19
0
15 Apr 2019
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
19
180
0
03 Mar 2019
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Peng Wang
Qi Wu
Jiewei Cao
Chunhua Shen
Lianli Gao
Anton Van Den Hengel
ObjD
22
252
0
12 Dec 2018
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
Long Chen
Hanwang Zhang
Jun Xiao
Xiangnan He
Shiliang Pu
Shih-Fu Chang
14
159
0
06 Dec 2018
Learning to Explain with Complemental Examples
Atsushi Kanehira
Tatsuya Harada
12
40
0
04 Dec 2018
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
32
865
0
27 Nov 2018
Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models
Tong Niu
Joey Tianyi Zhou
AAML
21
85
0
06 Sep 2018
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
36
616
0
05 Sep 2018
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Zhou Zhao
Q. Tian
Dacheng Tao
ObjD
18
138
0
09 May 2018
Referring Relationships
Ranjay Krishna
Ines Chami
Michael S. Bernstein
Li Fei-Fei
30
94
0
28 Mar 2018
Discriminability objective for training descriptive captions
Ruotian Luo
Brian L. Price
Scott D. Cohen
Gregory Shakhnarovich
24
202
0
12 Mar 2018
Grounding Referring Expressions in Images by Variational Context
Hanwang Zhang
Yulei Niu
Shih-Fu Chang
BDL
ObjD
21
219
0
05 Dec 2017
Reasoning about Fine-grained Attribute Phrases using Reference Games
Jong-Chyi Su
Chenyun Wu
Huaizu Jiang
Subhransu Maji
34
16
0
29 Aug 2017
Translating Neuralese
Jacob Andreas
Anca Dragan
Dan Klein
29
58
0
23 Apr 2017
1
2
Next