Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,644 papers shown
Title
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
158
274
0
07 Apr 2021
Image Composition Assessment with Saliency-augmented Multi-pattern Pooling
Bo Zhang
Li Niu
Liqing Zhang
CoGe
80
25
0
07 Apr 2021
Multimodal Continuous Visual Attention Mechanisms
António Farinhas
André F. T. Martins
P. Aguiar
64
7
0
07 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
116
99
0
05 Apr 2021
Procrustean Training for Imbalanced Deep Learning
Han-Jia Ye
De-Chuan Zhan
Wei-Lun Chao
65
31
0
05 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
64
26
0
02 Apr 2021
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
100
53
0
01 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
215
1,194
0
01 Apr 2021
Exploiting Relationship for Complex-scene Image Generation
Tianyu Hua
Hongdong Zheng
Yalong Bai
Wei Zhang
Xiaoping Zhang
Tao Mei
69
15
0
01 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
97
92
0
01 Apr 2021
Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation
Rongjie Li
Songyang Zhang
Bo Wan
Xuming He
263
221
0
01 Apr 2021
Benchmarking Representation Learning for Natural World Image Collections
Grant Van Horn
Elijah Cole
Sara Beery
Kimberly Wilber
Serge J. Belongie
Oisin Mac Aodha
SSL
VLM
76
179
0
30 Mar 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
99
121
0
30 Mar 2021
Fully Convolutional Scene Graph Generation
Hengyue Liu
Ning Yan
Masood S. Mortazavi
B. Bhanu
69
108
0
30 Mar 2021
Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays
Xiaosong Wang
Ziyue Xu
Leo K. Tam
Dong Yang
Daguang Xu
ViT
MedIm
68
24
0
30 Mar 2021
AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
Madeleine Grunde-McLaughlin
Ranjay Krishna
Maneesh Agrawala
CoGe
85
119
0
30 Mar 2021
Domain-robust VQA with diverse datasets and methods but no target labels
Ruotong Wang
Tristan D. Maidment
Ahmad Diab
Adriana Kovashka
R. Hwa
OOD
129
23
0
29 Mar 2021
Unified Graph Structured Models for Video Understanding
Anurag Arnab
Chen Sun
Cordelia Schmid
125
46
0
29 Mar 2021
Learning Generative Models of Textured 3D Meshes from Real-World Images
Dario Pavllo
Jonas Köhler
Thomas Hofmann
Aurelien Lucchi
3DV
3DH
109
51
0
29 Mar 2021
Visual Distant Supervision for Scene Graph Generation
Yuan Yao
Ao Zhang
Xu Han
Mengdi Li
C. Weber
Zhiyuan Liu
S. Wermter
Maosong Sun
70
39
0
29 Mar 2021
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
Shun-cheng Wu
Johanna Wald
Keisuke Tateno
Nassir Navab
Federico Tombari
3DPC
73
161
0
27 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
71
17
0
27 Mar 2021
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
Yongfei Liu
Bo Wan
Lin Ma
Xuming He
ObjD
94
57
0
24 Mar 2021
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Long Chen
Zhihong Jiang
Jun Xiao
Wei Liu
97
77
0
22 Mar 2021
#PraCegoVer: A Large Dataset for Image Captioning in Portuguese
G. O. D. Santos
Esther Luna Colombini
Sandra Avila
116
11
0
21 Mar 2021
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
Yonatan Bitton
Gabriel Stanovsky
Roy Schwartz
Michael Elhadad
CoGe
97
33
0
17 Mar 2021
A Comprehensive Survey of Scene Graphs: Generation and Application
Xiaojun Chang
Pengzhen Ren
Pengfei Xu
Zhihui Li
Xiaojiang Chen
Alexander G. Hauptmann
3DV
133
236
0
17 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
111
57
0
16 Mar 2021
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
89
84
0
16 Mar 2021
Knowledge driven Description Synthesis for Floor Plan Interpretation
Shreya Goyal
Chiranjoy Chattopadhyay
Gaurav Bhatnagar
3DV
27
13
0
15 Mar 2021
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images
Haolin Liu
Anran Lin
Xiaoguang Han
Lei Yang
Yizhou Yu
Shuguang Cui
84
40
0
14 Mar 2021
SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
Chenliang Li
Ming Yan
Haiyang Xu
Fuli Luo
Wei Wang
Bin Bi
Songfang Huang
VLM
74
36
0
14 Mar 2021
Holistic 3D Scene Understanding from a Single Image with Implicit Representation
Cheng Zhang
Zhaopeng Cui
Yinda Zhang
B. Zeng
Marc Pollefeys
Shuaicheng Liu
139
107
0
11 Mar 2021
RL-CSDia: Representation Learning of Computer Science Diagrams
Shaowei Wang
LingLing Zhang
Xuan Luo
Yi Yang
Xin Hu
Jun Liu
3DV
35
2
0
10 Mar 2021
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis
Chaoyi Zhang
Jianhui Yu
Yang Song
Weidong (Tom) Cai
3DPC
103
52
0
09 Mar 2021
Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation
Gengcong Yang
Jingyi Zhang
Yong Zhang
Baoyuan Wu
Yujiu Yang
89
67
0
09 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
140
39
0
06 Mar 2021
Causal Attention for Vision-Language Tasks
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
101
157
0
05 Mar 2021
Learning Asynchronous and Sparse Human-Object Interaction in Videos
Romero Morais
Vuong Le
Svetha Venkatesh
T. Tran
46
30
0
03 Mar 2021
Energy-Based Learning for Scene Graph Generation
M. Suhail
Abhay Mittal
Behjat Siddiquie
Chris Broaddus
J. Eledath
Gérard Medioni
Leonid Sigal
106
166
0
03 Mar 2021
Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query
Guanyu Cai
Jun Zhang
Xinyang Jiang
Yifei Gong
Lianghua He
Fufu Yu
Pai Peng
Xiaowei Guo
Feiyue Huang
Xing Sun
77
13
0
02 Mar 2021
Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph
Xin Ye
Yezhou Yang
102
25
0
01 Mar 2021
KANDINSKYPatterns -- An experimental exploration environment for Pattern Analysis and Machine Intelligence
Andreas Holzinger
Anna Saranti
Heimo Mueller
114
10
0
28 Feb 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.1K
30,053
0
26 Feb 2021
Simple multi-dataset detection
Xingyi Zhou
V. Koltun
Philipp Krahenbuhl
ObjD
295
118
0
25 Feb 2021
Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges
Jian Ding
Nan Xue
Guisong Xia
X. Bai
Wen Yang
...
Serge J. Belongie
Jiebo Luo
Mihai Datcu
Marcello Pelillo
Lefei Zhang
248
413
0
24 Feb 2021
Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation
Julia Ive
A. Li
Yishu Miao
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
70
12
0
22 Feb 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
106
301
0
22 Feb 2021
Learning Compositional Representation for Few-shot Visual Question Answering
Dalu Guo
Dacheng Tao
OOD
CoGe
64
4
0
21 Feb 2021
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
VLM
164
227
0
20 Feb 2021
Previous
1
2
3
...
18
19
20
...
31
32
33
Next