Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,034 papers shown
Title
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
27
37
0
22 Sep 2019
Triplet-Aware Scene Graph Embeddings
Brigit Schroeder
Subarna Tripathi
Hanlin Tang
3DPC
33
16
0
19 Sep 2019
Scene Graph Parsing by Attention Graph
Martin Andrews
Yew Ken Chia
Sam Witteveen
GNN
30
11
0
13 Sep 2019
Specifying Object Attributes and Relations in Interactive Scene Generation
Oron Ashual
Lior Wolf
36
178
0
11 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
24
65
0
10 Sep 2019
Explainable Video Action Reasoning via Prior Knowledge and State Transitions
Tao Zhuo
Zhiyong Cheng
Peng Zhang
Yongkang Wong
Mohan Kankanhalli
FAtt
33
60
0
28 Aug 2019
Situational Fusion of Visual Representation for Visual Navigation
Bokui (William) Shen
Danfei Xu
Yuke Zhu
Leonidas J. Guibas
Fei-Fei Li
Silvio Savarese
SSL
27
62
0
24 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
58
1,650
0
22 Aug 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
242
320
0
21 Aug 2019
Phrase Localization Without Paired Training Examples
Josiah Wang
Lucia Specia
35
41
0
20 Aug 2019
Image Synthesis From Reconfigurable Layout and Style
Wei Sun
Tianfu Wu
27
140
0
20 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
93
2,453
0
20 Aug 2019
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
F. Saleh
Hongdong Li
Stephen Gould
27
147
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
30
156
0
20 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Shuang Ma
Daniel J. McDuff
Yale Song
25
22
0
19 Aug 2019
Attention on Attention for Image Captioning
Lun Huang
Wenmin Wang
Jie Chen
Xiao-Yong Wei
24
823
0
19 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
80
895
0
16 Aug 2019
Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song
Shizhe Chen
Yida Zhao
Qin Jin
SSL
26
40
0
15 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang
Xing Xu
Yang Yang
Alan Hanjalic
Heng Tao Shen
Jingkuan Song
22
145
0
12 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
67
1,917
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
87
3,624
0
06 Aug 2019
Aligning Linguistic Words and Visual Semantic Units for Image Captioning
Longteng Guo
Jing Liu
Jinhui Tang
Jiangwei Li
W. Luo
Hanqing Lu
25
102
0
06 Aug 2019
Cascaded Revision Network for Novel Object Captioning
Qianyu Feng
Yu Wu
Hehe Fan
C. Yan
Yezhou Yang
29
35
0
06 Aug 2019
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
Jing Wang
Yingwei Pan
Ting Yao
Jinhui Tang
Tao Mei
VLM
BDL
DiffM
19
36
0
01 Aug 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
23
50
0
28 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
20
132
0
22 Jul 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
Shantipriya Parida
Ondrej Bojar
S. Dash
33
62
0
21 Jul 2019
CraftAssist: A Framework for Dialogue-enabled Interactive Agents
Jonathan Gray
Kavya Srinet
Yacine Jernite
Haonan Yu
Zhuoyuan Chen
Demi Guo
Siddharth Goyal
C. L. Zitnick
Arthur Szlam
33
39
0
19 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
36
796
0
25 Jun 2019
Integrating Knowledge and Reasoning in Image Understanding
Somak Aditya
Yezhou Yang
Chitta Baral
OCL
36
40
0
24 Jun 2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019
Xiaohan Wang
Yu Wu
Linchao Zhu
Yi Yang
18
19
0
22 Jun 2019
The Limited Multi-Label Projection Layer
Brandon Amos
V. Koltun
J. Zico Kolter
19
36
0
20 Jun 2019
Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding
Jian Zheng
S. Krishnamurthy
Ruxin Chen
Min-Hung Chen
Zhenhao Ge
Xiaohua Li
40
4
0
16 Jun 2019
Does Learning Require Memorization? A Short Tale about a Long Tail
Vitaly Feldman
TDI
49
481
0
12 Jun 2019
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Zhengjun Zha
Daqing Liu
Hanwang Zhang
Yongdong Zhang
Feng Wu
25
119
0
06 Jun 2019
Relational Reasoning using Prior Knowledge for Visual Captioning
Jingyi Hou
Xinxiao Wu
Yayun Qi
Wentian Zhao
Jiebo Luo
Yunde Jia
17
14
0
04 Jun 2019
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
36
343
0
31 May 2019
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
29
14
0
28 May 2019
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OOD
NAI
32
159
0
24 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
34
81
0
15 May 2019
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
31
171
0
10 May 2019
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Yikang Li
Tao Ma
Yeqi Bai
Nan Duan
Sining Wei
Xiaogang Wang
28
93
0
05 May 2019
On Exploring Undetermined Relationships for Visual Relationship Detection
Yibing Zhan
Jun-chen Yu
Ting Yu
Dacheng Tao
31
82
0
05 May 2019
Scene Graph Prediction with Limited Labels
V. Chen
P. Varma
Ranjay Krishna
Michael S. Bernstein
Christopher Ré
Li Fei-Fei
16
86
0
25 Apr 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
31
227
0
25 Apr 2019
Deep Metric Learning Beyond Binary Supervision
Sungyeon Kim
Minkyo Seo
Ivan Laptev
Minsu Cho
Suha Kwak
SSL
20
94
0
21 Apr 2019
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
25
77
0
18 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
31
30
0
16 Apr 2019
Previous
1
2
3
...
17
18
19
20
21
Next