Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,654 papers shown
Title
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
Edoardo Ponti
Ivan Vulić
MLLM
VLM
ELM
121
64
0
27 Jan 2022
Constrained Structure Learning for Scene Graph Generation
Daqing Liu
M. Bober
J. Kittler
3DV
CML
BDL
OCL
110
7
0
27 Jan 2022
RelTR: Relation Transformer for Scene Graph Generation
Yuren Cong
M. Yang
Bodo Rosenhahn
ViT
181
145
0
27 Jan 2022
SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
Peixi Xiong
Quanzeng You
Pei Yu
Zicheng Liu
Ying Wu
63
5
0
25 Jan 2022
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
Arjun Reddy Akula
OOD
116
3
0
24 Jan 2022
Supervised Visual Attention for Simultaneous Multimodal Machine Translation
Veneta Haralampieva
Ozan Caglayan
Lucia Specia
LRM
75
4
0
23 Jan 2022
Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation
Chao Chen
Yibing Zhan
Baosheng Yu
Liu Liu
Yong Luo
Bo Du
72
42
0
18 Jan 2022
ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues
Hengcan Shi
Munawar Hayat
Yicheng Wu
Jianfei Cai
VLM
73
62
0
18 Jan 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
76
10
0
18 Jan 2022
Boosting Video Representation Learning with Multi-Faceted Integration
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Xiaoping Zhang
Dong Wu
Tao Mei
63
9
0
11 Jan 2022
Self-Training Vision Language BERTs with a Unified Conditional Model
Xiaofeng Yang
Fengmao Lv
Fayao Liu
Guosheng Lin
SSL
VLM
85
14
0
06 Jan 2022
Incremental Object Grounding Using Scene Graphs
J. Yi
Yoonwoo Kim
Sonia Chernova
LM&Ro
96
9
0
06 Jan 2022
Semantically Grounded Visual Embeddings for Zero-Shot Learning
Shah Nawaz
Jacopo Cavazza
Alessio Del Bue
ObjD
FedML
VLM
105
3
0
03 Jan 2022
Exploring Motion and Appearance Information for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Pan Zhou
Yang Liu
97
42
0
03 Jan 2022
Improving Out-of-Distribution Robustness via Selective Augmentation
Huaxiu Yao
Yu Wang
Sai Li
Linjun Zhang
Weixin Liang
James Zou
Chelsea Finn
OOD
OODD
108
224
0
02 Jan 2022
Multimodal Image Synthesis and Editing: The Generative AI Era
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Shijian Lu
Lingjie Liu
Adam Kortylewski
Christian Theobalt
Eric Xing
EGVM
198
51
0
27 Dec 2021
Neuro-Symbolic Hierarchical Rule Induction
Claire Glanois
Xuening Feng
Zhaohui Jiang
Paul Weng
Matthieu Zimmer
Dong Li
Wulong Liu
NAI
73
24
0
26 Dec 2021
SGTR: End-to-end Scene Graph Generation with Transformer
Rongjie Li
Songyang Zhang
Xuming He
ViT
138
122
0
24 Dec 2021
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
125
102
0
23 Dec 2021
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
185
387
0
22 Dec 2021
A Survey of Natural Language Generation
Chenhe Dong
Hai-Tao Zheng
Haifan Gong
Mengzhao Chen
Junxin Li
Ying Shen
Min Yang
3DV
85
45
0
22 Dec 2021
Structured Semantic Transfer for Multi-Label Recognition with Partial Labels
Tianshui Chen
Tao Pu
Hefeng Wu
Yuan Xie
Liang Lin
77
59
0
21 Dec 2021
Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey
Jiaoyan Chen
Yuxia Geng
Zhuo Chen
Jeff Z. Pan
Yuan He
Wen Zhang
Ian Horrocks
Hua-zeng Chen
134
49
0
18 Dec 2021
Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs
Shengyu Feng
Subarna Tripathi
Hesham Mostafa
Marcel Nassar
Somdeb Majumdar
94
26
0
18 Dec 2021
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
Guosheng Lin
110
194
0
17 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLM
CLIP
53
34
0
17 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
153
584
0
16 Dec 2021
Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning
Thilini Cooray
Ngai-Man Cheung
93
6
0
16 Dec 2021
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang
Haoxuan You
Liunian Harold Li
Alireza Zareian
Suji Park
Yiqing Liang
Kai-Wei Chang
Shih-Fu Chang
ReLM
LRM
69
33
0
16 Dec 2021
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
94
49
0
15 Dec 2021
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Jianjian Cao
Xiameng Qin
Sanyuan Zhao
Jianbing Shen
72
21
0
14 Dec 2021
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
VPVLM
114
359
0
13 Dec 2021
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
86
114
0
12 Dec 2021
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
Tianyi Liu
Zuxuan Wu
Wenhan Xiong
Jingjing Chen
Yu-Gang Jiang
VLM
MLLM
88
10
0
10 Dec 2021
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
C. Eichenberg
Sid Black
Samuel Weinbach
Letitia Parcalabescu
Anette Frank
MLLM
VLM
72
101
0
09 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViT
VLM
86
91
0
09 Dec 2021
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
Yining Hong
Li Yi
J. Tenenbaum
Antonio Torralba
Chuang Gan
74
40
0
09 Dec 2021
CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning
Yue Fan
Dengxin Dai
Anna Kukleva
Bernt Schiele
76
46
0
08 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
154
719
0
08 Dec 2021
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Yi-Liang Nie
Linjie Li
Zhe Gan
Shuohang Wang
Chenguang Zhu
Michael Zeng
Zicheng Liu
Joey Tianyi Zhou
Lijuan Wang
62
6
0
08 Dec 2021
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
178
1,071
0
07 Dec 2021
Embedding Arithmetic of Multimodal Queries for Image Retrieval
Guillaume Couairon
Matthieu Cord
Matthijs Douze
Holger Schwenk
86
24
0
06 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
124
133
0
02 Dec 2021
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
96
24
0
02 Dec 2021
Consensus Graph Representation Learning for Better Grounded Image Captioning
Wenqiao Zhang
Haochen Shi
Siliang Tang
Jun Xiao
Qiang Yu
Yueting Zhuang
81
56
0
02 Dec 2021
Object-Centric Unsupervised Image Captioning
Zihang Meng
David Yang
Xuefei Cao
Ashish Shah
Ser-Nam Lim
OCL
VLM
80
12
0
02 Dec 2021
Relational Graph Learning for Grounded Video Description Generation
Wenqiao Zhang
Xinze Wang
Siliang Tang
Haizhou Shi
Haochen Shi
Jun Xiao
Yueting Zhuang
Wenjie Wang
67
33
0
02 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViT
VLM
70
82
0
01 Dec 2021
Weakly-Supervised Video Object Grounding via Causal Intervention
Wei Wang
Junyu Gao
Changsheng Xu
CML
104
22
0
01 Dec 2021
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
Sara Atito
Muhammad Awais
Ammarah Farooq
Zhenhua Feng
J. Kittler
56
17
0
30 Nov 2021
Previous
1
2
3
...
13
14
15
...
32
33
34
Next