Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,645 papers shown
Title
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
MLLM
VLM
72
61
0
28 Jun 2023
Hierarchical Matching and Reasoning for Multi-Query Image Retrieval
Zhong Ji
Zhihao Li
Yan Zhang
Haoran Wang
Yanwei Pang
Xuelong Li
69
11
0
26 Jun 2023
GeneCIS: A Benchmark for General Conditional Image Similarity
S. Vaze
Nicolas Carion
Ishan Misra
VLM
DiffM
100
30
0
13 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
111
28
0
12 Jun 2023
Single-Stage Visual Relationship Learning using Conditional Queries
Alakh Desai
Tz-Ying Wu
Subarna Tripathi
Nuno Vasconcelos
86
7
0
09 Jun 2023
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Prannay Kaul
Weidi Xie
Andrew Zisserman
ObjD
VLM
MLLM
73
47
0
08 Jun 2023
Diversifying Joint Vision-Language Tokenization Learning
Vardaan Pahuja
A. Piergiovanni
A. Angelova
71
0
0
06 Jun 2023
GRES: Generalized Referring Expression Segmentation
Chang Liu
Henghui Ding
Xudong Jiang
138
167
0
01 Jun 2023
Vocabulary-free Image Classification
Alessandro Conti
Enrico Fini
Massimiliano Mancini
Paolo Rota
Yiming Wang
Elisa Ricci
VLM
129
27
0
01 Jun 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
Ibrahim Said Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
92
7
0
28 May 2023
FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing
Zhuang Li
Yuyang Chai
Terry Yue Zhuo
Zhuang Li
Gholamreza Haffari
Fei Li
Donghong Ji
Quan Hung Tran
115
33
0
27 May 2023
Modularized Zero-shot VQA with Pre-trained Models
Rui Cao
Jing Jiang
LRM
89
3
0
27 May 2023
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts
Yunshui Li
Binyuan Hui
Zhichao Yin
Min Yang
Fei Huang
Yongbin Li
MoE
87
21
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
116
8
0
24 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
103
5
0
23 May 2023
NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection
D. Herron
Ernesto Jiménez-Ruiz
G. Tarroni
Tillman Weyde
41
2
0
22 May 2023
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch
Jindrich Libovický
63
8
0
17 May 2023
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
Peipei Liu
Hong Li
Yimo Ren
Jie Liu
Shuaizong Si
Hongsong Zhu
Limin Sun
72
5
0
15 May 2023
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Joey Tianyi Zhou
147
142
0
11 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
112
38
0
05 May 2023
Interactive Acquisition of Fine-grained Visual Concepts by Exploiting Semantics of Generic Characterizations in Discourse
Jonghyuk Park
A. Lascarides
S. Ramamoorthy
VLM
54
2
0
05 May 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
136
112
0
17 Apr 2023
MoMo: A shared encoder Model for text, image and multi-Modal representations
Rakesh Chada
Zhao-Heng Zheng
P. Natarajan
ViT
64
4
0
11 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIP
VLM
55
1
0
10 Apr 2023
V3Det: Vast Vocabulary Visual Detection Dataset
Jiaqi Wang
Pan Zhang
Tao Chu
Yuhang Cao
Yujie Zhou
Tong Wu
Bin Wang
Conghui He
Dahua Lin
VLM
ObjD
119
55
0
07 Apr 2023
Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
Yaochen Zhu
Xiangqing Shen
Rui Xia
119
4
0
05 Apr 2023
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
Paola Cascante-Bonilla
Khaled Shehada
James Smith
Sivan Doveh
Donghyun Kim
...
Gül Varol
A. Oliva
Vicente Ordonez
Rogerio Feris
Leonid Karlinsky
VLM
SyDa
125
42
0
30 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
134
169
0
28 Mar 2023
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
83
51
0
25 Mar 2023
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
Qifan Yu
Juncheng Li
Yuehua Wu
Siliang Tang
Wei Ji
Yueting Zhuang
105
38
0
23 Mar 2023
Open-Vocabulary Object Detection using Pseudo Caption Labels
Han-Cheol Cho
Won Young Jhoo
Woohyun Kang
Byungseok Roh
VLM
ObjD
60
20
0
23 Mar 2023
Location-Free Scene Graph Generation
Ege Özsoy
Felix Holm
Tobias Czempiel
Tobias Czempiel
Benjamin Busam
Nassir Navab
Benjamin Busam
135
4
0
20 Mar 2023
Decomposed Prototype Learning for Few-Shot Scene Graph Generation
Xingchen Li
Long Chen
Guikun Chen
Yinfu Feng
Yi Yang
Jun Xiao
84
7
0
20 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
141
3
0
18 Mar 2023
A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
Hui Tang
Kui Jia
OOD
84
14
0
16 Mar 2023
Unified Visual Relationship Detection with Vision and Language Models
Long Zhao
Liangzhe Yuan
Boqing Gong
Huayu Chen
Florian Schroff
Ming-Hsuan Yang
Hartwig Adam
Ting Liu
ObjD
93
9
0
16 Mar 2023
Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs
Artem Savkin
Rachid Ellouze
Nassir Navab
F. Tombari
70
12
0
15 Mar 2023
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
Qian Jiang
Changyou Chen
Han Zhao
Liqun Chen
Q. Ping
S. D. Tran
Yi Xu
Belinda Zeng
Trishul Chilimbi
97
43
0
10 Mar 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
143
77
0
10 Mar 2023
Knowledge-augmented Few-shot Visual Relation Detection
Tianyu Yu
Yongqian Li
Jiaoyan Chen
Hai-Tao Zheng
Haitao Zheng
...
Qingbin Liu
Wenqiang Liu
Dongxiao Huang
Bei Wu
Yexin Wang
93
6
0
09 Mar 2023
Multimodal Prompting with Missing Modalities for Visual Recognition
Yi-Lun Lee
Yi-Hsuan Tsai
Wei-Chen Chiu
Chen-Yu Lee
VPVLM
105
104
0
06 Mar 2023
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kan Chen
Xiangqian Wu
CoGe
52
9
0
05 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
99
4
0
04 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
186
11
0
03 Mar 2023
TabGenie: A Toolkit for Table-to-Text Generation
Zdeněk Kasner
E. Garanina
Ondvrej Plátek
Ondrej Dusek
LMTD
67
8
0
27 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
110
37
0
27 Feb 2023
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
Zhi Zhang
H. Yannakoudakis
Xiantong Zhen
Ekaterina Shutova
53
2
0
17 Feb 2023
LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation
Jiaxin Cheng
Xiao Liang
Xingjian Shi
Tong He
Tianjun Xiao
Mu Li
DiffM
82
69
0
16 Feb 2023
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
80
29
0
16 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
95
7
0
16 Feb 2023
Previous
1
2
3
...
5
6
7
...
31
32
33
Next