Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015

Bryan A. Plummer

Liwei Wang

Christopher M. Cervantes

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

24 / 374 papers shown

Title
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures Fanyi Xiao Leonid Sigal Yong Jae Lee 27 138 0 03 May 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks Liwei Wang Yin Li Jing-ling Huang Svetlana Lazebnik VLM 27 494 0 11 Apr 2017
Detecting Visual Relationships with Deep Relational Networks Bo Dai Yuqi Zhang Dahua Lin GNN 56 499 0 11 Apr 2017
Visual Translation Embedding Network for Visual Relation Detection Hanwang Zhang Zawlin Kyaw Shih-Fu Chang Tat-Seng Chua ViT 154 560 0 27 Feb 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions Licheng Yu Hao Tan Joey Tianyi Zhou Tamara L. Berg ObjD 35 272 0 30 Dec 2016
Top-down Visual Saliency Guided by Captions Vasili Ramanishka Abir Das Jianming Zhang Kate Saenko 19 142 0 21 Dec 2016
An Empirical Study of Language CNN for Image Captioning Jiuxiang Gu G. Wang Jianfei Cai Tsuhan Chen 23 132 0 21 Dec 2016
ImageNet pre-trained models with batch normalization Marcel Simon E. Rodner Joachim Denzler VLM SSeg 41 165 0 05 Dec 2016
Areas of Attention for Image Captioning M. Pedersoli Thomas Lucas Cordelia Schmid Jakob Verbeek 25 205 0 03 Dec 2016
Visual Dialog Abhishek Das Satwik Kottur Khushi Gupta Avi Singh Deshraj Yadav José M. F. Moura Devi Parikh Dhruv Batra 52 990 0 26 Nov 2016
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM Yan Huang Wei Wang Liang Wang 26 222 0 17 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching Hyeonseob Nam Jung-Woo Ha Jeonghee Kim 34 664 0 02 Nov 2016
Optimizing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced Data Management Aditya G. Parameswaran Akash Das Sarma Vipul Venkataraman 6 11 0 17 Oct 2016
Solving Visual Madlibs with Multiple Cues Tatiana Tommasi Arun Mallya Bryan A. Plummer Svetlana Lazebnik Alexander C. Berg Tamara L. Berg 34 18 0 11 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 34 1,884 0 29 Jul 2016
CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation Tasks Jindrich Libovický Jindřich Helcl Marek Tlustý Pavel Pecina Ondrej Bojar 6 67 0 23 Jun 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 152 1,464 0 06 Jun 2016
DenseCap: Fully Convolutional Localization Networks for Dense Captioning Justin Johnson A. Karpathy Li Fei-Fei VLM 42 1,159 0 24 Nov 2015
Order-Embeddings of Images and Language Ivan Vendrov Ryan Kiros Sanja Fidler R. Urtasun 26 542 0 19 Nov 2015
Sherlock: Scalable Fact Learning in Images Mohamed Elhoseiny Scott D. Cohen W. Chang Brian L. Price Ahmed Elgammal 16 26 0 16 Nov 2015
Visual7W: Grounded Question Answering in Images Yuke Zhu Oliver Groth Michael S. Bernstein Li Fei-Fei 44 870 0 11 Nov 2015
Multimodal Convolutional Neural Networks for Matching Image and Sentence Lin Ma Zhengdong Lu Lifeng Shang Hang Li 35 337 0 23 Apr 2015
Show and Tell: A Neural Image Caption Generator Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 3DV 27 5,992 0 17 Nov 2014
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics Yunchao Gong Qifa Ke Michael Isard Svetlana Lazebnik 3DV 76 584 0 18 Dec 2012