ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1608.00272
  4. Cited By
Modeling Context in Referring Expressions

Modeling Context in Referring Expressions

31 July 2016
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
ArXivPDFHTML

Papers citing "Modeling Context in Referring Expressions"

50 / 275 papers shown
Title
Cross-Modal Progressive Comprehension for Referring Segmentation
Cross-Modal Progressive Comprehension for Referring Segmentation
Si Liu
Tianrui Hui
Shaofei Huang
Yunchao Wei
Bo-wen Li
Guanbin Li
EgoV
VOS
28
124
0
15 May 2021
Encoder Fusion Network with Co-Attention Embedding for Referring Image
  Segmentation
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
Guang Feng
Zhiwei Hu
Lihe Zhang
Huchuan Lu
EgoV
25
168
0
05 May 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
90
864
0
26 Apr 2021
Playing Lottery Tickets with Vision and Language
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
109
54
0
23 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
28
332
0
17 Apr 2021
Co-Grounding Networks with Semantic Attention for Referring Expression
  Comprehension in Videos
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
21
16
0
23 Mar 2021
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time
  Image-Text Retrieval
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
38
82
0
16 Mar 2021
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
  Images
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images
Haolin Liu
Anran Lin
Xiaoguang Han
Lei Yang
Yizhou Yu
Shuguang Cui
27
40
0
14 Mar 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
277
525
0
04 Feb 2021
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised
  Learning
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning
Shivangi Aneja
C. Bregler
Matthias Nießner
SSL
60
48
0
15 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,434
0
04 Jan 2021
A Closer Look at the Robustness of Vision-and-Language Pre-trained
  Models
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Linjie Li
Zhe Gan
Jingjing Liu
VLM
33
42
0
15 Dec 2020
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Zongheng Tang
Yue Liao
Si Liu
Guanbin Li
Xiaojie Jin
Hongxu Jiang
Qian Yu
Dong Xu
21
94
0
10 Nov 2020
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Haidong Zhu
Arka Sadhu
Zhao-Heng Zheng
Ram Nevatia
ObjD
25
7
0
05 Nov 2020
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Referring Image Segmentation via Cross-Modal Progressive Comprehension
Shaofei Huang
Tianrui Hui
Si Liu
Guanbin Li
Yunchao Wei
Jizhong Han
Luoqi Liu
Bo-wen Li
EgoV
29
176
0
01 Oct 2020
RefVOS: A Closer Look at Referring Expressions for Video Object
  Segmentation
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
Míriam Bellver
Carles Ventura
Carina Silberer
Ioannis V. Kazakos
Jordi Torres
Xavier Giró-i-Nieto
VOS
29
32
0
01 Oct 2020
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Yu Liu
Luc Van Gool
Matthew Blaschko
Tinne Tuytelaars
Marie-Francine Moens
30
6
0
18 Sep 2020
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression
  Grounding
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Long Chen
Wenbo Ma
Jun Xiao
Hanwang Zhang
Shih-Fu Chang
ObjD
17
89
0
03 Sep 2020
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based
  Moment Localization
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization
Daizong Liu
Xiaoye Qu
Xiao-Yang Liu
Jianfeng Dong
Pan Zhou
Zichuan Xu
33
129
0
04 Aug 2020
PhraseCut: Language-based Image Segmentation in the Wild
PhraseCut: Language-based Image Segmentation in the Wild
Chenyun Wu
Zhe-nan Lin
Scott D. Cohen
Trung Bui
Subhransu Maji
VLM
13
111
0
03 Aug 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
50
93
0
19 Jul 2020
Visual Relation Grounding in Videos
Visual Relation Grounding in Videos
Junbin Xiao
Xindi Shang
Xun Yang
Sheng Tang
Tat-Seng Chua
20
40
0
17 Jul 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
35
489
0
11 Jun 2020
Give Me Something to Eat: Referring Expression Comprehension with
  Commonsense Knowledge
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Peng Wang
Dongyang Liu
Hui Li
Qi Wu
ObjD
24
19
0
02 Jun 2020
Behind the Scene: Revealing the Secrets of Pre-trained
  Vision-and-Language Models
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
22
127
0
15 May 2020
Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual
  Grounding
Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding
Thierry Deruyttere
Guillem Collell
Marie-Francine Moens
LRM
19
8
0
19 Mar 2020
Cops-Ref: A new Dataset and Task on Compositional Referring Expression
  Comprehension
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension
Zhenfang Chen
Peng Wang
Lin Ma
Kwan-Yee K. Wong
Qi Wu
ObjD
34
68
0
01 Mar 2020
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Dave Zhenyu Chen
Angel X. Chang
Matthias Nießner
3DPC
47
350
0
18 Dec 2019
Grounding-Tracking-Integration
Grounding-Tracking-Integration
Zhengyuan Yang
T. Kumar
Tianlang Chen
Jinsong Su
Jiebo Luo
27
53
0
13 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression
  Comprehension
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
Talk2Car: Taking Control of Your Self-Driving Car
Talk2Car: Taking Control of Your Self-Driving Car
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Luc Van Gool
Marie-Francine Moens
LM&Ro
31
124
0
24 Sep 2019
Dynamic Graph Attention for Referring Expression Comprehension
Dynamic Graph Attention for Referring Expression Comprehension
Sibei Yang
Guanbin Li
Yizhou Yu
OCL
25
215
0
18 Sep 2019
Phrase Localization Without Paired Training Examples
Phrase Localization Without Paired Training Examples
Josiah Wang
Lucia Specia
35
42
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
30
157
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
14
360
0
18 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language
  Interactions
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
25
132
0
22 Jul 2019
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment
N. Ilinykh
Sina Zarrieß
David Schlangen
27
43
0
11 Jul 2019
Variational Context: Exploiting Visual and Textual Context for Grounding
  Referring Expressions
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjD
BDL
36
24
0
08 Jul 2019
Multimodal Transformer with Multi-View Visual Representation for Image
  Captioning
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
27
377
0
20 May 2019
Language-Conditioned Graph Networks for Relational Reasoning
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
31
171
0
10 May 2019
ShapeGlot: Learning Language for Shape Differentiation
ShapeGlot: Learning Language for Shape Differentiation
Panos Achlioptas
Judy Fan
Robert D. Hawkins
Noah D. Goodman
Leonidas J. Guibas
36
83
0
08 May 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
31
227
0
25 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image,
  Multi-sentence Documents
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
31
30
0
16 Apr 2019
Natural Language Semantics With Pictures: Some Language & Vision
  Datasets and Potential Uses for Computational Semantics
Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics
David Schlangen
33
6
0
15 Apr 2019
Evaluating the Representational Hub of Language and Vision Models
Evaluating the Representational Hub of Language and Vision Models
Ravi Shekhar
Ece Takmaz
Raquel Fernández
Raffaella Bernardi
30
11
0
12 Apr 2019
Improving Referring Expression Grounding with Cross-modal
  Attention-guided Erasing
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
19
180
0
03 Mar 2019
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Hexiang Hu
Ishan Misra
Laurens van der Maaten
24
22
0
19 Jan 2019
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions
Runtao Liu
Chenxi Liu
Yutong Bai
Alan Yuille
NAI
ObjD
22
123
0
03 Jan 2019
Neighbourhood Watch: Referring Expression Comprehension via
  Language-guided Graph Attention Networks
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Peng Wang
Qi Wu
Jiewei Cao
Chunhua Shen
Lianli Gao
Anton Van Den Hengel
ObjD
22
252
0
12 Dec 2018
Previous
123456
Next