Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,644 papers shown
Title
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
110
36
0
04 Dec 2020
Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D
Ankit Goyal
Kaiyu Yang
Dawei Yang
Jia Deng
91
42
0
03 Dec 2020
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
102
120
0
30 Nov 2020
Language-Driven Region Pointer Advancement for Controllable Image Captioning
Annika Lindh
R. Ross
John D. Kelleher
43
14
0
30 Nov 2020
Self-Supervised Real-to-Sim Scene Generation
Aayush Prakash
Shoubhik Debnath
Jean-Francois Lafleche
Eric Cameracci
Gavriel State
Stan Birchfield
M. Law
82
26
0
30 Nov 2020
General Multi-label Image Classification with Transformers
Jack Lanchantin
Tianlu Wang
Vicente Ordonez
Yanjun Qi
ViT
80
268
0
27 Nov 2020
Road Scene Graph: A Semantic Graph-Based Scene Representation Dataset for Intelligent Vehicles
Yafu Tian
Alexander Carballo
Ruifeng Li
K. Takeda
GNN
89
27
0
27 Nov 2020
Learning from Lexical Perturbations for Consistent Visual Question Answering
Spencer Whitehead
Hui Wu
Yi R. Fung
Heng Ji
Rogerio Feris
Kate Saenko
68
11
0
26 Nov 2020
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong
Qi Wu
Yuankai Qi
Cristian Rodriguez-Opazo
Stephen Gould
LM&Ro
128
303
0
26 Nov 2020
Open-Vocabulary Object Detection Using Captions
Alireza Zareian
Kevin Dela Rosa
Derek Hao Hu
Shih-Fu Chang
VLM
ObjD
187
436
0
20 Nov 2020
Classification by Attention: Scene Graph Classification with Prior Knowledge
Sahand Sharifzadeh
Sina Moayed Baharlou
Volker Tresp
OCL
76
52
0
19 Nov 2020
Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision
Yujie Zhong
Linhai Xie
Sen Wang
Lucia Specia
Yishu Miao
SSL
26
0
0
19 Nov 2020
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions
Jianan Wang
Boyang Albert Li
Xiangyu Fan
Jing-Hua Lin
Yanwei Fu
46
2
0
15 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
134
423
0
14 Nov 2020
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Zongheng Tang
Yue Liao
Si Liu
Guanbin Li
Xiaojie Jin
Hongxu Jiang
Qian Yu
Dong Xu
68
99
0
10 Nov 2020
After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion Functions for Scene Graph Generation
Mohamed Karim Belaid
89
1
0
09 Nov 2020
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts
Ece Takmaz
Mario Giulianelli
Sandro Pezzelle
Arabella J. Sinclair
Raquel Fernández
98
26
0
09 Nov 2020
CapWAP: Captioning with a Purpose
Adam Fisch
Kenton Lee
Ming-Wei Chang
J. Clark
Regina Barzilay
53
11
0
09 Nov 2020
Dual ResGCN for Balanced Scene GraphGeneration
Jingyi Zhang
Yong Zhang
Baoyuan Wu
Yanbo Fan
Fumin Shen
Heng Tao Shen
67
12
0
09 Nov 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
44
45
0
04 Nov 2020
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
Yue Wang
Jing Li
Michael R. Lyu
Irwin King
75
16
0
03 Nov 2020
Diverse Image Captioning with Context-Object Split Latent Spaces
Shweta Mahajan
Stefan Roth
64
42
0
02 Nov 2020
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Zanxia Jin
Heran Wu
Chun Yang
Fang Zhou
Jingyan Qin
Lei Xiao
Xu-Cheng Yin
88
31
0
24 Oct 2020
Show and Speak: Directly Synthesize Spoken Description of Images
Xinsheng Wang
Siyuan Feng
Jihua Zhu
M. Hasegawa-Johnson
O. Scharenborg
152
4
0
23 Oct 2020
Learning Dual Semantic Relations with Graph Attention for Image-Text Matching
Keyu Wen
Xiaodong Gu
Qingrong Cheng
76
97
0
22 Oct 2020
Contextual Heterogeneous Graph Network for Human-Object Interaction Detection
Hai Wang
Weishi Zheng
Yingbiao Ling
88
88
0
20 Oct 2020
Language and Visual Entity Relationship Graph for Agent Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Yuankai Qi
Qi Wu
Stephen Gould
LM&Ro
226
134
0
19 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
28
4
0
17 Oct 2020
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan
Joey Tianyi Zhou
CLIP
89
121
0
14 Oct 2020
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!
Jack Hessel
Lillian Lee
108
75
0
13 Oct 2020
CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
Fuli Luo
Pengcheng Yang
Shicheng Li
Xuancheng Ren
Xu Sun
VLM
SSL
73
16
0
13 Oct 2020
DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Basura Fernando
Hongdong Li
Stephen Gould
192
10
0
13 Oct 2020
Webly Supervised Image Classification with Metadata: Automatic Noisy Label Correction via Visual-Semantic Graph
Jingkang Yang
Weirong Chen
Xue Jiang
Xiaopeng Yan
Huabin Zheng
Wayne Zhang
NoLa
74
13
0
12 Oct 2020
Beyond Language: Learning Commonsense from Images for Reasoning
Wanqing Cui
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
71
5
0
10 Oct 2020
Background Learnable Cascade for Zero-Shot Object Detection
Ye Zheng
Ruoran Huang
Chuanqi Han
Xi Huang
Li Cui
ObjD
123
48
0
09 Oct 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
115
27
0
08 Oct 2020
Pathological Visual Question Answering
Xuehai He
Zhuo Cai
Wenlan Wei
Yichen Zhang
Luntian Mou
Eric Xing
P. Xie
140
24
0
06 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
40
2
0
05 Oct 2020
Multi-Modal Open-Domain Dialogue
Kurt Shuster
Eric Michael Smith
Da Ju
Jason Weston
AI4CE
137
44
0
02 Oct 2020
CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns
L. Ferreira
Douglas De Rizzo Meneghetti
P. Santos
21
2
0
02 Oct 2020
Learning Object Detection from Captions via Textual Scene Attributes
Achiya Jerbi
Roei Herzig
Jonathan Berant
Gal Chechik
Amir Globerson
79
21
0
30 Sep 2020
Attention that does not Explain Away
Nan Ding
Xinjie Fan
Zhenzhong Lan
Dale Schuurmans
Radu Soricut
54
3
0
29 Sep 2020
Spatial Attention as an Interface for Image Captioning Models
P. Sadler
51
0
0
29 Sep 2020
Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score
He Huang
Shunta Saito
Yuta Kikuchi
Eiichi Matsumoto
Wei Tang
Philip S. Yu
36
5
0
28 Sep 2020
Human-Object Interaction Detection:A Quick Survey and Examination of Methods
T. Bergstrom
Humphrey Shi
ObjD
36
12
0
27 Sep 2020
SceneGen: Generative Contextual Scene Augmentation using Scene Graph Priors
Mohammad Keshavarzi
Aakash Parikh
Xiyu Zhai
Melody Mao
Luisa Caldas
An Yang
79
24
0
25 Sep 2020
Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases
Gerhard Weikum
Luna Dong
Simon Razniewski
Fabian M. Suchanek
144
128
0
24 Sep 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho
Jiasen Lu
Dustin Schwenk
Hannaneh Hajishirzi
Aniruddha Kembhavi
VLM
MLLM
95
102
0
23 Sep 2020
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering
Tuong Khanh Long Do
Binh X. Nguyen
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
Thanh-Toan Do
40
2
0
23 Sep 2020
ALICE: Active Learning with Contrastive Natural Language Explanations
Weixin Liang
James Zou
Zhou Yu
VLM
105
51
0
22 Sep 2020
Previous
1
2
3
...
20
21
22
...
31
32
33
Next