Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,103 papers shown
Title
Deep Interactive Region Segmentation and Captioning
Ali Sharifi Boroujerdi
M. Khanian
M. Breuß
24
7
0
26 Jul 2017
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
Xuwang Yin
Vicente Ordonez
VLM
40
55
0
22 Jul 2017
Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation
Jean-Benoit Delbrouck
Stéphane Dupont
Omar Seddati
25
8
0
04 Jul 2017
Pixels to Graphs by Associative Embedding
Alejandro Newell
Jia Deng
GNN
VOS
36
232
0
22 Jun 2017
Care about you: towards large-scale human-centric visual relationship detection
Bohan Zhuang
Qi Wu
Chunhua Shen
Ian Reid
Anton Van Den Hengel
14
21
0
28 May 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
67
578
0
18 May 2017
Inferring and Executing Programs for Visual Reasoning
Justin Johnson
B. Hariharan
L. V. D. van der Maaten
Judy Hoffman
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
NAI
23
541
0
10 May 2017
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
Xiaosong Wang
Yifan Peng
Le Lu
Zhiyong Lu
M. Bagheri
Ronald M. Summers
LM&MA
57
2,474
0
05 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
35
139
0
03 May 2017
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
Yuya Yoshikawa
Yutaro Shigeto
A. Takeuchi
3DV
30
118
0
02 May 2017
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
65
1,218
0
02 May 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
24
11
0
24 Apr 2017
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
Wei-Lun Chao
Hexiang Hu
Fei Sha
22
37
0
24 Apr 2017
Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Zhou Ren
Xiaoyu Wang
Ning Zhang
Xutao Lv
Li-Jia Li
34
324
0
12 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
27
494
0
11 Apr 2017
Detecting Visual Relationships with Deep Relational Networks
Bo Dai
Yuqi Zhang
Dahua Lin
GNN
59
500
0
11 Apr 2017
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
30
231
0
28 Mar 2017
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Xiaodan Liang
Zhiting Hu
Huatian Zhang
Chuang Gan
Eric Xing
GAN
27
200
0
21 Mar 2017
Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection
Xiaodan Liang
Lisa Lee
Eric Xing
29
250
0
08 Mar 2017
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
154
560
0
27 Feb 2017
On the Origin of Deep Learning
Haohan Wang
Bhiksha Raj
MedIm
3DV
VLM
48
223
0
24 Feb 2017
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
24
386
0
19 Feb 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
46
273
0
30 Dec 2016
Learning Visual N-Grams from Web Data
Ang Li
Allan Jabri
Armand Joulin
L. V. D. van der Maaten
VLM
20
136
0
29 Dec 2016
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
L. V. D. van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
23
2,322
0
20 Dec 2016
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
Anton Van Den Hengel
OOD
39
86
0
16 Dec 2016
The More You Know: Using Knowledge Graphs for Image Classification
Kenneth Marino
Ruslan Salakhutdinov
Abhinav Gupta
GNN
OCL
41
345
0
14 Dec 2016
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
44
165
0
05 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
134
3,130
0
02 Dec 2016
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
42
402
0
30 Nov 2016
Sampled Image Tagging and Retrieval Methods on User Generated Content
Karl S. Ni
Kyle Zaragoza
Charles Foster
C. Carrano
Barry Y. Chen
Yonas Tesfaye
A. Gude
22
6
0
21 Nov 2016
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
30
169
0
21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs
J. Krause
Justin Johnson
Ranjay Krishna
Li Fei-Fei
VLM
36
373
0
20 Nov 2016
On Support Relations and Semantic Scene Graphs
M. Yang
Wentong Liao
H. Ackermann
Bodo Rosenhahn
GNN
19
60
0
19 Sep 2016
A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality
Kenji Hata
Ranjay Krishna
Fei-Fei Li
Michael S. Bernstein
53
42
0
15 Sep 2016
Learning to generalize to new compositions in image understanding
Y. Atzmon
Jonathan Berant
Vahid Kezami
Amir Globerson
Gal Chechik
26
67
0
27 Aug 2016
Solving Visual Madlibs with Multiple Cues
Tatiana Tommasi
Arun Mallya
Bryan A. Plummer
Svetlana Lazebnik
Alexander C. Berg
Tamara L. Berg
37
18
0
11 Aug 2016
Visual Relationship Detection with Language Priors
Cewu Lu
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
VLM
16
1,134
0
31 Jul 2016
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
36
1,885
0
29 Jul 2016
Much Ado About Time: Exhaustive Annotation of Temporal Data
Gunnar A. Sigurdsson
Olga Russakovsky
Ali Farhadi
Ivan Laptev
Abhinav Gupta
28
28
0
25 Jul 2016
FVQA: Fact-based Visual Question Answering
Peng Wang
Qi Wu
Chunhua Shen
Anton van den Hengel
A. Dick
CoGe
39
454
0
17 Jun 2016
Progressive Attention Networks for Visual Attribute Prediction
Paul Hongsuck Seo
Zhe-nan Lin
Scott D. Cohen
Xiaohui Shen
Bohyung Han
21
41
0
08 Jun 2016
Adversarial Feature Learning
Jiasen Lu
Philipp Krahenbuhl
Trevor Darrell
GAN
56
1,824
0
31 May 2016
Data Programming: Creating Large Training Sets, Quickly
Alexander Ratner
Christopher De Sa
Sen Wu
Daniel Selsam
Christopher Ré
16
709
0
25 May 2016
Visual Storytelling
Ting-Hao 'Kenneth' Huang
Huang
Francis Ferraro
N. Mostafazadeh
Ishan Misra
...
C. L. Zitnick
Devi Parikh
Lucy Vanderwende
Michel Galley
Margaret Mitchell
VGen
22
464
0
13 Apr 2016
Measuring and Predicting Tag Importance for Image Retrieval
Shangwen Li
S. Purushotham
Chen Chen
Yuzhuo Ren
C.-C. Jay Kuo
31
32
0
28 Feb 2016
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
74
1,159
0
24 Nov 2015
Visual7W: Grounded Question Answering in Images
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
44
873
0
11 Nov 2015
Explicit Knowledge-based Reasoning for Visual Question Answering
Peng Wang
Qi Wu
Chunhua Shen
Anton Van Den Hengel
A. Dick
39
257
0
09 Nov 2015
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
54
1,315
0
07 Nov 2015
Previous
1
2
3
...
21
22
23
Next