Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.04870
Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
J. Hockenmaier
Svetlana Lazebnik
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"
24 / 374 papers shown
Title
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
27
138
0
03 May 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
27
494
0
11 Apr 2017
Detecting Visual Relationships with Deep Relational Networks
Bo Dai
Yuqi Zhang
Dahua Lin
GNN
56
499
0
11 Apr 2017
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
154
560
0
27 Feb 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
35
272
0
30 Dec 2016
Top-down Visual Saliency Guided by Captions
Vasili Ramanishka
Abir Das
Jianming Zhang
Kate Saenko
19
142
0
21 Dec 2016
An Empirical Study of Language CNN for Image Captioning
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
23
132
0
21 Dec 2016
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
41
165
0
05 Dec 2016
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
25
205
0
03 Dec 2016
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
52
990
0
26 Nov 2016
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
Yan Huang
Wei Wang
Liang Wang
26
222
0
17 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
34
664
0
02 Nov 2016
Optimizing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced Data Management
Aditya G. Parameswaran
Akash Das Sarma
Vipul Venkataraman
6
11
0
17 Oct 2016
Solving Visual Madlibs with Multiple Cues
Tatiana Tommasi
Arun Mallya
Bryan A. Plummer
Svetlana Lazebnik
Alexander C. Berg
Tamara L. Berg
34
18
0
11 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
34
1,884
0
29 Jul 2016
CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation Tasks
Jindrich Libovický
Jindřich Helcl
Marek Tlustý
Pavel Pecina
Ondrej Bojar
6
67
0
23 Jun 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,464
0
06 Jun 2016
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
42
1,159
0
24 Nov 2015
Order-Embeddings of Images and Language
Ivan Vendrov
Ryan Kiros
Sanja Fidler
R. Urtasun
26
542
0
19 Nov 2015
Sherlock: Scalable Fact Learning in Images
Mohamed Elhoseiny
Scott D. Cohen
W. Chang
Brian L. Price
Ahmed Elgammal
16
26
0
16 Nov 2015
Visual7W: Grounded Question Answering in Images
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
44
870
0
11 Nov 2015
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Lin Ma
Zhengdong Lu
Lifeng Shang
Hang Li
35
337
0
23 Apr 2015
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
27
5,992
0
17 Nov 2014
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics
Yunchao Gong
Qifa Ke
Michael Isard
Svetlana Lazebnik
3DV
76
584
0
18 Dec 2012
Previous
1
2
3
4
5
6
7
8