Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1608.00272
Cited By
Modeling Context in Referring Expressions
31 July 2016
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Modeling Context in Referring Expressions"
50 / 276 papers shown
Title
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
21
21
0
15 Nov 2022
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
Jiazhan Feng
Qingfeng Sun
Can Xu
Pu Zhao
Yaming Yang
Chongyang Tao
Dongyan Zhao
Qingwei Lin
32
52
0
10 Nov 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
30
4
0
23 Oct 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
78
107
0
23 Oct 2022
ULN: Towards Underspecified Vision-and-Language Navigation
Weixi Feng
Tsu-Jui Fu
Yujie Lu
William Yang Wang
49
5
0
18 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
21
8
0
09 Oct 2022
MUG: Interactive Multimodal Grounding on User Interfaces
Tao Li
Gang Li
Jingjie Zheng
Purple Wang
Yang Li
LLMAG
41
8
0
29 Sep 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
30
23
0
28 Sep 2022
VLMAE: Vision-Language Masked Autoencoder
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Chen Wu
Xiujun Shu
Bohan Ren
VLM
34
11
0
19 Aug 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
36
2
0
02 Jul 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
30
124
0
15 Jun 2022
Referring Image Matting
Jizhizi Li
Jing Zhang
Dacheng Tao
ObjD
VLM
29
23
0
10 Jun 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
32
13
0
30 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
29
38
0
23 May 2022
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
22
21
0
10 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
44
160
0
30 Apr 2022
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
Li Yang
Yan Xu
Chunfen Yuan
Wei Liu
Bing Li
Weiming Hu
ObjD
50
113
0
30 Apr 2022
Instance-Specific Feature Propagation for Referring Segmentation
Chang Liu
Xudong Jiang
Henghui Ding
ISeg
30
55
0
26 Apr 2022
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection
Jun-Bin Luo
Jiahui Fu
Xianghao Kong
Chen Gao
Haibing Ren
Hao Shen
Huaxia Xia
Si Liu
37
89
0
13 Apr 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Sanjay Subramanian
William Merrill
Trevor Darrell
Matt Gardner
Sameer Singh
Anna Rohrbach
ObjD
44
126
0
12 Apr 2022
Multi-View Transformer for 3D Visual Grounding
Shijia Huang
Yilun Chen
Jiaya Jia
Liwei Wang
31
114
0
05 Apr 2022
ReSTR: Convolution-free Referring Image Segmentation Using Transformers
N. Kim
Dongwon Kim
Cuiling Lan
Wenjun Zeng
Suha Kwak
30
137
0
31 Mar 2022
To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo
Yiran Luo
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
29
4
0
30 Mar 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
16
61
0
29 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
25
16
0
27 Mar 2022
Emergence of hierarchical reference systems in multi-agent communication
Xenia Ohmer
M. Duda
Elia Bruni
25
8
0
24 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
31
74
0
18 Mar 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
48
51
0
16 Mar 2022
Interactive Robotic Grasping with Attribute-Guided Disambiguation
Yang Yang
Xibai Lou
Changhyun Choi
21
30
0
15 Mar 2022
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
VLM
30
31
0
01 Mar 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
74
850
0
07 Feb 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
20
10
0
18 Jan 2022
Language as Queries for Referring Video Object Segmentation
Jiannan Wu
Yi-Xin Jiang
Pei Sun
Zehuan Yuan
Ping Luo
31
142
0
03 Jan 2022
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
47
371
0
22 Dec 2021
Image Segmentation Using Text and Image Prompts
Timo Lüddecke
Alexander S. Ecker
CLIP
VLM
25
457
0
18 Dec 2021
CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision
A. Shrivastava
Ramprasaath R. Selvaraju
Nikhil Naik
Vicente Ordonez
VLM
CLIP
30
6
0
14 Dec 2021
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip Torr
148
309
0
04 Dec 2021
End-to-End Referring Video Object Segmentation with Multimodal Transformers
Adam Botach
Evgenii Zheltonozhskii
Chaim Baskin
VOS
34
141
0
29 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
29
111
0
23 Nov 2021
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation
Zizhang Li
Mengmeng Wang
Jianbiao Mei
Yong Liu
20
18
0
21 Nov 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
211
221
0
24 Sep 2021
Audio-Visual Grounding Referring Expression for Robotic Manipulation
Yefei Wang
Kaili Wang
Yi Wang
Di Guo
Huaping Liu
F. Sun
38
12
0
22 Sep 2021
YouRefIt: Embodied Reference Understanding with Language and Gesture
Yixin Chen
Qing Li
Deqian Kong
Yik Lun Kei
Song-Chun Zhu
Tao Gao
Yixin Zhu
Siyuan Huang
LM&Ro
37
41
0
08 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
30
18
0
04 Sep 2021
Vision-Language Transformer and Query Generation for Referring Segmentation
Henghui Ding
Chang-rui Liu
Suchen Wang
Xudong Jiang
40
252
0
12 Aug 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Chenyu You
Caiming Xiong
Guosheng Lin
FaML
83
1,893
0
16 Jul 2021
Leveraging Explainability for Comprehending Referring Expressions in the Real World
Fethiye Irmak Dogan
G. I. Melsión
Iolanda Leite
39
8
0
12 Jul 2021
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
Junha Roh
Karthik Desingh
Ali Farhadi
Dieter Fox
22
95
0
07 Jul 2021
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
30
15
0
08 Jun 2021
Previous
1
2
3
4
5
6
Next