ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.02283
  4. Cited By
Generation and Comprehension of Unambiguous Object Descriptions

Generation and Comprehension of Unambiguous Object Descriptions

7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
    ObjD
ArXivPDFHTML

Papers citing "Generation and Comprehension of Unambiguous Object Descriptions"

50 / 274 papers shown
Title
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Manuel Kolmet
Qunjie Zhou
Aljosa Osep
Laura Leal-Taixe
27
23
0
28 Mar 2022
Emergence of hierarchical reference systems in multi-agent communication
Emergence of hierarchical reference systems in multi-agent communication
Xenia Ohmer
M. Duda
Elia Bruni
25
8
0
24 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video
  Segmentation
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
29
74
0
18 Mar 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
43
50
0
16 Mar 2022
Non-neural Models Matter: A Re-evaluation of Neural Referring Expression
  Generation Systems
Non-neural Models Matter: A Re-evaluation of Neural Referring Expression Generation Systems
F. Same
Guanyi Chen
Kees van Deemter
20
12
0
15 Mar 2022
Grounding Commands for Autonomous Vehicles via Layer Fusion with
  Region-specific Dynamic Layer Attention
Grounding Commands for Autonomous Vehicles via Layer Fusion with Region-specific Dynamic Layer Attention
Hou Pong Chan
M. Guo
Chengguang Xu
30
4
0
14 Mar 2022
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
  Temporal Sentence Grounding
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding
Daizong Liu
Xiang Fang
Wei Hu
Pan Zhou
25
37
0
06 Mar 2022
Phrase-Based Affordance Detection via Cyclic Bilateral Interaction
Phrase-Based Affordance Detection via Cyclic Bilateral Interaction
Liangsheng Lu
Wei Zhai
Hongcheng Luo
Yu Kang
Yang Cao
21
19
0
24 Feb 2022
CAISE: Conversational Agent for Image Search and Editing
CAISE: Conversational Agent for Image Search and Editing
Hyounghun Kim
Doo Soon Kim
Seunghyun Yoon
Franck Dernoncourt
Trung Bui
Joey Tianyi Zhou
27
6
0
24 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
68
850
0
07 Feb 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal
  Matching
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
20
10
0
18 Jan 2022
Language as Queries for Referring Video Object Segmentation
Language as Queries for Referring Video Object Segmentation
Jiannan Wu
Yi-Xin Jiang
Pei Sun
Zehuan Yuan
Ping Luo
28
141
0
03 Jan 2022
Video as Conditional Graph Hierarchy for Multi-Granular Question
  Answering
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
30
111
0
12 Dec 2021
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip Torr
148
307
0
04 Dec 2021
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning
  and Visual Grounding
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
21
29
0
02 Dec 2021
CRIS: CLIP-Driven Referring Image Segmentation
CRIS: CLIP-Driven Referring Image Segmentation
Zhaoqing Wang
Yu Lu
Qiang Li
Xunqiang Tao
Yan Guo
Ming Gong
Tongliang Liu
VLM
61
360
0
30 Nov 2021
End-to-End Referring Video Object Segmentation with Multimodal
  Transformers
End-to-End Referring Video Object Segmentation with Multimodal Transformers
Adam Botach
Evgenii Zheltonozhskii
Chaim Baskin
VOS
34
140
0
29 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language
  Modeling
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
27
111
0
23 Nov 2021
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image
  Segmentation
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation
Zizhang Li
Mengmeng Wang
Jianbiao Mei
Yong Liu
20
18
0
21 Nov 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
208
221
0
24 Sep 2021
ReaSCAN: Compositional Reasoning in Language Grounding
ReaSCAN: Compositional Reasoning in Language Grounding
Zhengxuan Wu
Elisa Kreiss
Desmond C. Ong
Christopher Potts
CoGe
LRM
29
22
0
18 Sep 2021
Panoptic Narrative Grounding
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
82
22
0
10 Sep 2021
YouRefIt: Embodied Reference Understanding with Language and Gesture
YouRefIt: Embodied Reference Understanding with Language and Gesture
Yixin Chen
Qing Li
Deqian Kong
Yik Lun Kei
Song-Chun Zhu
Tao Gao
Yixin Zhu
Siyuan Huang
LM&Ro
37
41
0
08 Sep 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and
  Intra-modal Knowledge Integration
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
27
53
0
16 Aug 2021
Vision-Language Transformer and Query Generation for Referring
  Segmentation
Vision-Language Transformer and Query Generation for Referring Segmentation
Henghui Ding
Chang-rui Liu
Suchen Wang
Xudong Jiang
40
252
0
12 Aug 2021
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
Junha Roh
Karthik Desingh
Ali Farhadi
Dieter Fox
22
95
0
07 Jul 2021
Giving Commands to a Self-Driving Car: How to Deal with Uncertain
  Situations?
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
30
15
0
08 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual
  Grounding
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
13
188
0
06 Jun 2021
Cross-Modal Progressive Comprehension for Referring Segmentation
Cross-Modal Progressive Comprehension for Referring Segmentation
Si Liu
Tianrui Hui
Shaofei Huang
Yunchao Wei
Bo-wen Li
Guanbin Li
EgoV
VOS
28
123
0
15 May 2021
Encoder Fusion Network with Co-Attention Embedding for Referring Image
  Segmentation
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
Guang Feng
Zhiwei Hu
Lihe Zhang
Huchuan Lu
EgoV
25
168
0
05 May 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
69
863
0
26 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
28
330
0
17 Apr 2021
Towards General Purpose Vision Systems
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
11
50
0
01 Apr 2021
Co-Grounding Networks with Semantic Attention for Referring Expression
  Comprehension in Videos
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
21
16
0
23 Mar 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Qi Feng
Yunchao Wei
Mingming Cheng
Yi Yang
27
5
0
18 Mar 2021
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding
  on Point Clouds through Instance Multi-level Contextual Referring
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
Zhihao Yuan
Xu Yan
Yinghong Liao
Ruimao Zhang
Sheng Wang
Zhen Li
Shuguang Cui
71
129
0
01 Mar 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
299
1,084
0
17 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
277
525
0
04 Feb 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,431
0
04 Jan 2021
Contrastive Learning with Adversarial Perturbations for Conditional Text
  Generation
Contrastive Learning with Adversarial Perturbations for Conditional Text Generation
Seanie Lee
Dong Bok Lee
Sung Ju Hwang
27
106
0
14 Dec 2020
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Dave Zhenyu Chen
A. Gholami
Matthias Nießner
Angel X. Chang
3DPC
23
159
0
03 Dec 2020
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
  of Vision-and-Language BERTs
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
35
119
0
30 Nov 2020
Where Are You? Localization from Embodied Dialog
Where Are You? Localization from Embodied Dialog
Meera Hahn
Jacob Krantz
Dhruv Batra
Devi Parikh
James M. Rehg
Stefan Lee
Peter Anderson
LM&Ro
22
27
0
16 Nov 2020
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Zongheng Tang
Yue Liao
Si Liu
Guanbin Li
Xiaojie Jin
Hongxu Jiang
Qian Yu
Dong Xu
21
94
0
10 Nov 2020
Refer, Reuse, Reduce: Generating Subsequent References in Visual and
  Conversational Contexts
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts
Ece Takmaz
Mario Giulianelli
Sandro Pezzelle
Arabella J. Sinclair
Raquel Fernández
20
26
0
09 Nov 2020
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Haidong Zhu
Arka Sadhu
Zhao-Heng Zheng
Ram Nevatia
ObjD
25
7
0
05 Nov 2020
Explaining Deep Neural Networks
Explaining Deep Neural Networks
Oana-Maria Camburu
XAI
FAtt
33
26
0
04 Oct 2020
RefVOS: A Closer Look at Referring Expressions for Video Object
  Segmentation
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
Míriam Bellver
Carles Ventura
Carina Silberer
Ioannis V. Kazakos
Jordi Torres
Xavier Giró-i-Nieto
VOS
29
32
0
01 Oct 2020
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Yu Liu
Luc Van Gool
Matthew Blaschko
Tinne Tuytelaars
Marie-Francine Moens
30
6
0
18 Sep 2020
Towards Unique and Informative Captioning of Images
Towards Unique and Informative Captioning of Images
Zeyu Wang
Berthy Feng
Karthik R. Narasimhan
Olga Russakovsky
25
37
0
08 Sep 2020
Previous
123456
Next