Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues

21 November 2016

Bryan A. Plummer

Arun Mallya

Christopher M. Cervantes

J. Hockenmaier

Svetlana Lazebnik

ArXiv PDF HTML

Papers citing "Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues"

25 / 25 papers shown

Title
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark Ying Liu Yijing Hua Haojiang Chai Yanbo Wang TengQi Ye ObjD 62 0 0 19 Mar 2025
Language-Guided Diffusion Model for Visual Grounding Sijia Chen Baochun Li 37 5 0 18 Aug 2023
Adapting CLIP For Phrase Localization Without Further Training Jiahao Li G. Shakhnarovich Raymond A. Yeh VLM CLIP 30 25 0 07 Apr 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching Hengcan Shi Munawar Hayat Jianfei Cai ObjD 18 10 0 18 Jan 2022
Weakly-Supervised Video Object Grounding via Causal Intervention Wei Wang Junyu Gao Changsheng Xu CML 30 20 0 01 Dec 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Aishwarya Kamath Mannat Singh Yann LeCun Gabriel Synnaeve Ishan Misra Nicolas Carion ObjD VLM 57 858 0 26 Apr 2021
Towards General Purpose Vision Systems Tanmay Gupta Amita Kamath Aniruddha Kembhavi Derek Hoiem 11 49 0 01 Apr 2021
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Simon Ging Mohammadreza Zolfaghari Hamed Pirsiavash Thomas Brox ViT CLIP 13 168 0 01 Nov 2020
Enriching Video Captions With Contextual Text Philipp Rimle Pelin Dogan Markus Gross 30 3 0 29 Jul 2020
Detecting Human-Object Interactions with Action Co-occurrence Priors Dong-Jin Kim Xiao Sun Jinsoo Choi Stephen Lin In So Kweon 18 124 0 17 Jul 2020
Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks Federico Baldassarre Kevin Smith Josephine Sullivan Hossein Azizpour 21 25 0 16 Jun 2020
A Real-time Global Inference Network for One-stage Referring Expression Comprehension Yiyi Zhou Rongrong Ji Gen Luo Xiaoshuai Sun Jinsong Su Xinghao Ding Chia-Wen Lin Q. Tian ObjD 24 60 0 07 Dec 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators Kuang-Huei Lee Hamid Palangi Xi Chen Houdong Hu Jianfeng Gao VLM 24 37 0 22 Sep 2019
Phrase Localization Without Paired Training Examples Josiah Wang Lucia Specia 27 41 0 20 Aug 2019
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation Zih-Siou Hung Arun Mallya Svetlana Lazebnik ViT 29 14 0 28 May 2019
From Recognition to Cognition: Visual Commonsense Reasoning Rowan Zellers Yonatan Bisk Ali Farhadi Yejin Choi LRM BDL OCL ReLM 27 865 0 27 Nov 2018
LinkNet: Relational Embedding for Scene Graph Sanghyun Woo Dahun Kim Donghyeon Cho In So Kweon GNN 13 147 0 15 Nov 2018
Context-Dependent Diffusion Network for Visual Relationship Detection Zhen Cui Chunyan Xu Wenming Zheng Jian Yang GNN 14 50 0 11 Sep 2018
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts Raymond A. Yeh Jinjun Xiong Wen-mei W. Hwu Minh Do A. Schwing 22 57 0 29 Mar 2018
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction Roei Herzig Moshiko Raboh Gal Chechik Jonathan Berant Amir Globerson GNN OCL 24 133 0 15 Feb 2018
Scene Graph Generation from Objects, Phrases and Region Captions Yikang Li Wanli Ouyang Bolei Zhou Kun Wang Xiaogang Wang 21 499 0 31 Jul 2017
Pixels to Graphs by Associative Embedding Alejandro Newell Jia Deng GNN VOS 22 232 0 22 Jun 2017
Visual Translation Embedding Network for Visual Relation Detection Hanwang Zhang Zawlin Kyaw Shih-Fu Chang Tat-Seng Chua ViT 148 560 0 27 Feb 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 152 1,464 0 06 Jun 2016
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics Yunchao Gong Qifa Ke Michael Isard Svetlana Lazebnik 3DV 76 584 0 18 Dec 2012