ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
J. Hockenmaier
Svetlana Lazebnik
ArXivPDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 374 papers shown
Title
Multi-Task Learning with Deep Neural Networks: A Survey
Multi-Task Learning with Deep Neural Networks: A Survey
M. Crawshaw
CVBM
36
609
0
10 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
33
228
0
27 Aug 2020
PhraseCut: Language-based Image Segmentation in the Wild
PhraseCut: Language-based Image Segmentation in the Wild
Chenyun Wu
Zhe-nan Lin
Scott D. Cohen
Trung Bui
Subhransu Maji
VLM
13
111
0
03 Aug 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
50
93
0
19 Jul 2020
RATT: Recurrent Attention to Transient Tasks for Continual Image
  Captioning
RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
Riccardo Del Chiaro
Bartlomiej Twardowski
Andrew D. Bagdanov
Joost van de Weijer
CLL
VLM
22
40
0
13 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
371
0
29 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
35
488
0
11 Jun 2020
Deep daxes: Mutual exclusivity arises through both learning biases and
  pragmatic strategies in neural networks
Deep daxes: Mutual exclusivity arises through both learning biases and pragmatic strategies in neural networks
Kristina Gulordava
T. Brochhagen
Gemma Boleda
11
3
0
08 Apr 2020
Graph Structured Network for Image-Text Matching
Graph Structured Network for Image-Text Matching
Chunxiao Liu
Zhendong Mao
Tianzhu Zhang
Hongtao Xie
Bin Wang
Yongdong Zhang
19
232
0
01 Apr 2020
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Dave Zhenyu Chen
Angel X. Chang
Matthias Nießner
3DPC
27
347
0
18 Dec 2019
Grounding-Tracking-Integration
Grounding-Tracking-Integration
Zhengyuan Yang
T. Kumar
Tianlang Chen
Jinsong Su
Jiebo Luo
27
53
0
13 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression
  Comprehension
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
Weak Supervision helps Emergence of Word-Object Alignment and improves
  Vision-Language Tasks
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
21
15
0
06 Dec 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
31
295
0
06 Oct 2019
MULE: Multimodal Universal Language Embedding
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
24
39
0
08 Sep 2019
Aesthetic Image Captioning From Weakly-Labelled Photographs
Aesthetic Image Captioning From Weakly-Labelled Photographs
Koustav Ghosal
A. Rana
A. Smolic
27
25
0
29 Aug 2019
Phrase Localization Without Paired Training Examples
Phrase Localization Without Paired Training Examples
Josiah Wang
Lucia Specia
27
41
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
30
156
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
14
360
0
18 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
35
1,913
0
09 Aug 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine
  Translation
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
Shantipriya Parida
Ondrej Bojar
S. Dash
25
62
0
21 Jul 2019
Variational Context: Exploiting Visual and Textual Context for Grounding
  Referring Expressions
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjD
BDL
36
24
0
08 Jul 2019
Distilling Translations with Visual Awareness
Distilling Translations with Visual Awareness
Julia Ive
Pranava Madhyastha
Lucia Specia
VLM
22
76
0
18 Jun 2019
Listening while Speaking and Visualizing: Improving ASR through
  Multimodal Chain
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
19
3
0
03 Jun 2019
Contextual Translation Embedding for Visual Relationship Detection and
  Scene Graph Generation
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
29
14
0
28 May 2019
Don't Blame Distributional Semantics if it can't do Entailment
Don't Blame Distributional Semantics if it can't do Entailment
M. Westera
Gemma Boleda
CoGe
9
20
0
17 May 2019
Deep Metric Learning Beyond Binary Supervision
Deep Metric Learning Beyond Binary Supervision
Sungyeon Kim
Minkyo Seo
Ivan Laptev
Minsu Cho
Suha Kwak
SSL
15
94
0
21 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image,
  Multi-sentence Documents
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
23
30
0
16 Apr 2019
Natural Language Semantics With Pictures: Some Language & Vision
  Datasets and Potential Uses for Computational Semantics
Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics
David Schlangen
25
6
0
15 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for
  Video-and-Language Research
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
32
540
0
06 Apr 2019
Boosted Attention: Leveraging Human Attention for Image Captioning
Boosted Attention: Leveraging Human Attention for Image Captioning
Shi Chen
Qi Zhao
16
47
0
18 Mar 2019
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Hexiang Hu
Ishan Misra
L. V. D. van der Maaten
24
22
0
19 Jan 2019
Grounded Video Description
Grounded Video Description
Luowei Zhou
Yannis Kalantidis
Xinlei Chen
Jason J. Corso
Marcus Rohrbach
27
190
0
17 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
23
51
0
03 Dec 2018
From Recognition to Cognition: Visual Commonsense Reasoning
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
27
865
0
27 Nov 2018
CUNI System for the WMT18 Multimodal Translation Task
CUNI System for the WMT18 Multimodal Translation Task
Jindřich Helcl
Jindrich Libovický
Dušan Variš
11
57
0
12 Nov 2018
A Comprehensive Survey of Deep Learning for Image Captioning
A Comprehensive Survey of Deep Learning for Image Captioning
Md. Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
33
760
0
06 Oct 2018
TVQA: Localized, Compositional Video Question Answering
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
34
616
0
05 Sep 2018
Doubly Attentive Transformer Machine Translation
Doubly Attentive Transformer Machine Translation
Hasan Sait Arslan
Mark Fishel
G. Anbarjafari
27
13
0
30 Jul 2018
Rethinking Diversified and Discriminative Proposal Generation for Visual
  Grounding
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Zhou Zhao
Q. Tian
Dacheng Tao
ObjD
18
138
0
09 May 2018
Interpretable and Globally Optimal Prediction for Textual Grounding
  using Image Concepts
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Raymond A. Yeh
Jinjun Xiong
Wen-mei W. Hwu
Minh Do
A. Schwing
22
57
0
29 Mar 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
A. Schwing
22
40
0
29 Mar 2018
Neural Baby Talk
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
200
434
0
27 Mar 2018
TieNet: Text-Image Embedding Network for Common Thorax Disease
  Classification and Reporting in Chest X-rays
TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays
Xiaosong Wang
Yifan Peng
Le Lu
Zhiyong Lu
Ronald M. Summers
MedIm
27
462
0
12 Jan 2018
Grounding Referring Expressions in Images by Variational Context
Grounding Referring Expressions in Images by Variational Context
Hanwang Zhang
Yulei Niu
Shih-Fu Chang
BDL
ObjD
21
219
0
05 Dec 2017
Link the head to the "beak": Zero Shot Learning from Noisy Text
  Description at Part Precision
Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision
Mohamed Elhoseiny
Yizhe Zhu
Han Zhang
Ahmed Elgammal
VLM
30
132
0
04 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised
  Attention in VQA and Question-Focused Semantic Segmentation
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
27
126
0
15 Aug 2017
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
Xuwang Yin
Vicente Ordonez
VLM
32
55
0
22 Jul 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
15
2,859
0
26 May 2017
TALL: Temporal Activity Localization via Language Query
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
51
799
0
05 May 2017
Previous
12345678
Next