ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
J. Hockenmaier
Svetlana Lazebnik
ArXivPDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

24 / 374 papers shown
Title
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
27
138
0
03 May 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
27
494
0
11 Apr 2017
Detecting Visual Relationships with Deep Relational Networks
Detecting Visual Relationships with Deep Relational Networks
Bo Dai
Yuqi Zhang
Dahua Lin
GNN
56
499
0
11 Apr 2017
Visual Translation Embedding Network for Visual Relation Detection
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
154
560
0
27 Feb 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
35
272
0
30 Dec 2016
Top-down Visual Saliency Guided by Captions
Top-down Visual Saliency Guided by Captions
Vasili Ramanishka
Abir Das
Jianming Zhang
Kate Saenko
19
142
0
21 Dec 2016
An Empirical Study of Language CNN for Image Captioning
An Empirical Study of Language CNN for Image Captioning
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
23
132
0
21 Dec 2016
ImageNet pre-trained models with batch normalization
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
41
165
0
05 Dec 2016
Areas of Attention for Image Captioning
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
25
205
0
03 Dec 2016
Visual Dialog
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
52
990
0
26 Nov 2016
Instance-aware Image and Sentence Matching with Selective Multimodal
  LSTM
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
Yan Huang
Wei Wang
Liang Wang
26
222
0
17 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
34
664
0
02 Nov 2016
Optimizing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced
  Data Management
Optimizing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced Data Management
Aditya G. Parameswaran
Akash Das Sarma
Vipul Venkataraman
6
11
0
17 Oct 2016
Solving Visual Madlibs with Multiple Cues
Solving Visual Madlibs with Multiple Cues
Tatiana Tommasi
Arun Mallya
Bryan A. Plummer
Svetlana Lazebnik
Alexander C. Berg
Tamara L. Berg
34
18
0
11 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
34
1,884
0
29 Jul 2016
CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation
  Tasks
CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation Tasks
Jindrich Libovický
Jindřich Helcl
Marek Tlustý
Pavel Pecina
Ondrej Bojar
6
67
0
23 Jun 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,464
0
06 Jun 2016
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
42
1,159
0
24 Nov 2015
Order-Embeddings of Images and Language
Order-Embeddings of Images and Language
Ivan Vendrov
Ryan Kiros
Sanja Fidler
R. Urtasun
26
542
0
19 Nov 2015
Sherlock: Scalable Fact Learning in Images
Sherlock: Scalable Fact Learning in Images
Mohamed Elhoseiny
Scott D. Cohen
W. Chang
Brian L. Price
Ahmed Elgammal
16
26
0
16 Nov 2015
Visual7W: Grounded Question Answering in Images
Visual7W: Grounded Question Answering in Images
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
44
870
0
11 Nov 2015
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Lin Ma
Zhengdong Lu
Lifeng Shang
Hang Li
35
337
0
23 Apr 2015
Show and Tell: A Neural Image Caption Generator
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
27
5,992
0
17 Nov 2014
A Multi-View Embedding Space for Modeling Internet Images, Tags, and
  their Semantics
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics
Yunchao Gong
Qifa Ke
Michael Isard
Svetlana Lazebnik
3DV
76
584
0
18 Dec 2012
Previous
12345678