Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.03745
Cited By
Grounding of Textual Phrases in Images by Reconstruction
12 November 2015
Anna Rohrbach
Marcus Rohrbach
Ronghang Hu
Trevor Darrell
Bernt Schiele
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grounding of Textual Phrases in Images by Reconstruction"
50 / 101 papers shown
Title
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
50
93
0
19 Jul 2020
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Peng Wang
Dongyang Liu
Hui Li
Qi Wu
ObjD
24
19
0
02 Jun 2020
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
20
60
0
11 Mar 2020
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Dave Zhenyu Chen
Angel X. Chang
Matthias Nießner
3DPC
32
347
0
18 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
Adversarial reconstruction for Multi-modal Machine Translation
Jean-Benoit Delbrouck
Stéphane Dupont
GAN
37
2
0
07 Oct 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
31
295
0
06 Oct 2019
Dynamic Graph Attention for Referring Expression Comprehension
Sibei Yang
Guanbin Li
Yizhou Yu
OCL
14
215
0
18 Sep 2019
Phrase Localization Without Paired Training Examples
Josiah Wang
Lucia Specia
32
41
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
14
360
0
18 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Keren Ye
Ruotong Wang
Adriana Kovashka
Wei Li
Danfeng Qin
Jesse Berent
25
60
0
23 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
20
132
0
22 Jul 2019
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjD
BDL
36
24
0
08 Jul 2019
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
31
171
0
10 May 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
31
227
0
25 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
31
30
0
16 Apr 2019
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
19
180
0
03 Mar 2019
Interpretable machine learning: definitions, methods, and applications
W. James Murdoch
Chandan Singh
Karl Kumbier
R. Abbasi-Asl
Bin-Xia Yu
XAI
HAI
47
1,417
0
14 Jan 2019
Grounded Video Description
Luowei Zhou
Yannis Kalantidis
Xinlei Chen
Jason J. Corso
Marcus Rohrbach
27
190
0
17 Dec 2018
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Peng Wang
Qi Wu
Jiewei Cao
Chunhua Shen
Lianli Gao
Anton Van Den Hengel
ObjD
22
252
0
12 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
23
51
0
03 Dec 2018
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
32
866
0
27 Nov 2018
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
36
617
0
05 Sep 2018
Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions
Ke Ning
Linchao Zhu
Ming Cai
Yi Yang
Di Xie
Fei Wu
18
2
0
27 Aug 2018
Topic-Guided Attention for Image Captioning
Zhihao Zhu
Zhan Xue
Zejian Yuan
30
23
0
10 Jul 2018
Visual Referring Expression Recognition: What Do Systems Actually Learn?
Volkan Cirik
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
28
63
0
30 May 2018
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Zhou Zhao
Q. Tian
Dacheng Tao
ObjD
18
138
0
09 May 2018
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Raymond A. Yeh
Jinjun Xiong
Wen-mei W. Hwu
Minh Do
A. Schwing
30
57
0
29 Mar 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
A. Schwing
22
40
0
29 Mar 2018
Referring Relationships
Ranjay Krishna
Ines Chami
Michael S. Bernstein
Li Fei-Fei
30
94
0
28 Mar 2018
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
200
434
0
27 Mar 2018
Interactive Grounded Language Acquisition and Generalization in a 2D World
Haonan Yu
Haichao Zhang
Wenyuan Xu
LLMAG
LM&Ro
25
77
0
31 Jan 2018
Grounding Referring Expressions in Images by Variational Context
Hanwang Zhang
Yulei Niu
Shih-Fu Chang
BDL
ObjD
21
219
0
05 Dec 2017
Object Referring in Visual Scene with Spoken Language
A. Vasudevan
Dengxin Dai
Luc Van Gool
37
18
0
10 Nov 2017
Semantic Image Retrieval via Active Grounding of Visual Situations
Max H. Quinn
E. Conser
Jordan M. Witte
Melanie Mitchell
11
9
0
31 Oct 2017
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
27
126
0
15 Aug 2017
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
Xuwang Yin
Vicente Ordonez
VLM
37
55
0
22 Jul 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
32
138
0
03 May 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
27
494
0
11 Apr 2017
Detecting Visual Relationships with Deep Relational Networks
Bo Dai
Yuqi Zhang
Dahua Lin
GNN
56
500
0
11 Apr 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe-nan Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
36
234
0
23 Mar 2017
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning
Fan Wu
Zhongwen Xu
Yi Yang
ObjD
34
11
0
22 Mar 2017
Comprehension-guided referring expressions
Ruotian Luo
Gregory Shakhnarovich
ObjD
29
171
0
12 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
40
272
0
30 Dec 2016
Top-down Visual Saliency Guided by Captions
Vasili Ramanishka
Abir Das
Jianming Zhang
Kate Saenko
21
142
0
21 Dec 2016
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
30
205
0
03 Dec 2016
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
42
401
0
30 Nov 2016
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
57
990
0
26 Nov 2016
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
28
1,227
0
31 Jul 2016
Previous
1
2
3
Next