Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,648 papers shown
Title
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
113
119
0
03 Jun 2021
Learning to Select: A Fully Attentive Approach for Novel Object Captioning
Marco Cagrandi
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
67
9
0
02 Jun 2021
ViTA: Visual-Linguistic Translation by Aligning Object Tags
Kshitij Gupta
Devansh Gautam
R. Mamidi
49
14
0
01 Jun 2021
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li
Jie Lei
Zhe Gan
Jingjing Liu
AAML
VLM
112
75
0
01 Jun 2021
Modeling Text-visual Mutual Dependency for Multi-modal Dialog Generation
Shuhe Wang
Yuxian Meng
Xiaofei Sun
Leilei Gan
Rongbin Ouyang
Rui Yan
Tianwei Zhang
Jiwei Li
66
15
0
30 May 2021
LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering
Zujie Liang
Haifeng Hu
Jiaying Zhu
99
38
0
29 May 2021
Maintaining Common Ground in Dynamic Environments
Takuma Udagawa
Akiko Aizawa
48
13
0
29 May 2021
Linguistic Structures as Weak Supervision for Visual Scene Graph Generation
Keren Ye
Adriana Kovashka
64
54
0
28 May 2021
How saccadic vision might help with theinterpretability of deep networks
Iana Sereda
Grigory V. Osipov
FAtt
46
0
0
27 May 2021
Maria: A Visual Experience Powered Conversational Agent
Zujie Liang
Huang Hu
Can Xu
Chongyang Tao
Xiubo Geng
Yining Chen
Fan Liang
Daxin Jiang
91
32
0
27 May 2021
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation
Tao Tu
Q. Ping
Govind Thattai
Gokhan Tur
Premkumar Natarajan
75
18
0
24 May 2021
SAT: 2D Semantics Assisted Training for 3D Visual Grounding
Zhengyuan Yang
Songyang Zhang
Liwei Wang
Jiebo Luo
3DPC
109
126
0
24 May 2021
Human-centric Relation Segmentation: Dataset and Solution
Si Liu
Zitian Wang
Yulu Gao
Lejian Ren
Yue Liao
Guanghui Ren
Bo Li
Shuicheng Yan
38
12
0
24 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
237
59
0
24 May 2021
Exemplar-Based Open-Set Panoptic Segmentation Network
Jaedong Hwang
Seoung Wug Oh
Joon-Young Lee
Bohyung Han
VLM
131
51
0
18 May 2021
Multi-Modal Image Captioning for the Visually Impaired
Hiba Ahsan
Nikita Bhalla
Daivat Bhatt
Kaivankumar Shah
76
23
0
17 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
138
142
0
17 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
45
4
0
16 May 2021
Neural Trees for Learning on Graphs
Rajat Talak
Siyi Hu
Lisa Peng
Luca Carlone
99
25
0
15 May 2021
Plot and Rework: Modeling Storylines for Visual Storytelling
Chi-Yang Hsu
Yun-Wei Chu
Ting-Hao 'Kenneth' Huang
Lun-Wei Ku
65
31
0
14 May 2021
High-Resolution Complex Scene Synthesis with Transformers
Manuel Jahn
Robin Rombach
Bjorn Ommer
ViT
82
37
0
13 May 2021
Connecting What to Say With Where to Look by Modeling Human Attention Traces
Zihang Meng
Licheng Yu
Ning Zhang
Tamara L. Berg
Babak Damavandi
Vikas Singh
Amy Bearman
157
25
0
12 May 2021
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
Amanpreet Singh
Guan Pang
Mandy Toh
Jing Huang
Wojciech Galuba
Tal Hassner
86
174
0
12 May 2021
Cross-Modal Generative Augmentation for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
78
11
0
11 May 2021
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
Mathew Monfort
SouYoung Jin
Alexander H. Liu
David Harwath
Rogerio Feris
James Glass
Aude Oliva
56
60
0
10 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
59
0
0
10 May 2021
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
D. Guo
Ruiying Lu
Bo Chen
Zequn Zeng
Mingyuan Zhou
VLM
89
9
0
10 May 2021
Exploring Explicit and Implicit Visual Relationships for Image Captioning
Zeliang Song
Xiaofei Zhou
26
8
0
06 May 2021
Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries
Qingyi Dong
Zhuowen Tu
Haofu Liao
Yuting Zhang
Vijay Mahadevan
Stefano Soatto
ViT
77
38
0
05 May 2021
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads
Chenyu Gao
Qi Zhu
Peng Wang
Qi Wu
25
2
0
30 Apr 2021
Segmentation-grounded Scene Graph Generation
Siddhesh Khandelwal
M. Suhail
Leonid Sigal
89
28
0
29 Apr 2021
REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter
Hanbo Zhang
Deyu Yang
Han Wang
Binglei Zhao
Xuguang Lan
Jishiyu Ding
Nanning Zheng
82
41
0
29 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
249
897
0
26 Apr 2021
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
112
242
0
26 Apr 2021
RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
Jun Chen
Aniket Agarwal
Sherif Abdelkarim
Deyao Zhu
Mohamed Elhoseiny
ViT
150
17
0
24 Apr 2021
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
142
56
0
23 Apr 2021
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
Guanghui Xu
Shuaicheng Niu
Mingkui Tan
Yucheng Luo
Qing Du
Qi Wu
DiffM
84
58
0
23 Apr 2021
GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering
Weixin Liang
Yanhao Jiang
Zixuan Liu
GNN
79
33
0
20 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
92
24
0
20 Apr 2021
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
Chunyan Miao
Houqiang Li
60
42
0
19 Apr 2021
Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction
Federico Bianchi
C. Greco
Jacopo Tagliabue
80
9
0
18 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
489
816
0
18 Apr 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
67
125
0
16 Apr 2021
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Shir Gur
Natalia Neverova
C. Stauffer
Ser-Nam Lim
Douwe Kiela
A. Reiter
147
30
0
16 Apr 2021
Exploring Visual Engagement Signals for Representation Learning
Menglin Jia
Zuxuan Wu
A. Reiter
Claire Cardie
Serge Belongie
Ser-Nam Lim
82
13
0
15 Apr 2021
Zero-Shot Instance Segmentation
Ye Zheng
Jiahong Wu
Yongqiang Qin
Faen Zhang
Li Cui
ISeg
VLM
69
54
0
14 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
99
163
0
13 Apr 2021
StylePTB: A Compositional Benchmark for Fine-grained Controllable Text Style Transfer
Yiwei Lyu
Paul Pu Liang
Hai Pham
Eduard H. Hovy
Barnabas Poczos
Ruslan Salakhutdinov
Louis-Philippe Morency
66
44
0
12 Apr 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
Yuankai Qi
Zizheng Pan
Yicong Hong
Ming-Hsuan Yang
Anton Van Den Hengel
Qi Wu
LM&Ro
82
69
0
09 Apr 2021
How Transferable are Reasoning Patterns in VQA?
Corentin Kervadec
Theo Jaunet
G. Antipov
M. Baccouche
Romain Vuillemot
Christian Wolf
LRM
59
28
0
08 Apr 2021
Previous
1
2
3
...
17
18
19
...
31
32
33
Next