Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,034 papers shown
Title
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
T. M. Thai
Son T. Luu
40
0
0
22 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
27
47
0
22 Mar 2023
Location-Free Scene Graph Generation
Ege Özsoy
Felix Holm
Tobias Czempiel
Tobias Czempiel
Benjamin Busam
Nassir Navab
Benjamin Busam
50
4
0
20 Mar 2023
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
Shi Chen
Qi Zhao
47
5
0
18 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
29
3
0
18 Mar 2023
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection
Arushi Rai
Adriana Kovashka
27
0
0
16 Mar 2023
Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs
Artem Savkin
Rachid Ellouze
Nassir Navab
F. Tombari
27
11
0
15 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
71
14
0
14 Mar 2023
ViM: Vision Middleware for Unified Downstream Transferring
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
32
1
0
13 Mar 2023
Hierarchical Relationships: A New Perspective to Enhance Scene Graph Generation
Bowen Jiang
Camillo J Taylor
23
3
0
13 Mar 2023
Zero-Shot Object Searching Using Large-scale Object Relationship Prior
Hongyi Chen
Ruinian Xu
Shuo Cheng
Patricio A. Vela
Danfei Xu
LM&Ro
29
5
0
10 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
103
1,823
0
09 Mar 2023
Knowledge-augmented Few-shot Visual Relation Detection
Tianyu Yu
Yong Li
Jiaoyan Chen
Hai-Tao Zheng
Haitao Zheng
...
Qingbin Liu
Wenqiang Liu
Dongxiao Huang
Bei Wu
Yexin Wang
55
6
0
09 Mar 2023
Transformer-based Image Generation from Scene Graphs
Renato Sortino
S. Palazzo
C. Spampinato
ViT
59
15
0
08 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
33
14
0
07 Mar 2023
Multimodal Prompting with Missing Modalities for Visual Recognition
Yi-Lun Lee
Yi-Hsuan Tsai
Wei-Chen Chiu
Chen-Yu Lee
VPVLM
33
94
0
06 Mar 2023
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kan Chen
Xiangqian Wu
CoGe
24
8
0
05 Mar 2023
Knowledge-Based Counterfactual Queries for Visual Question Answering
Theodoti Stoikou
Maria Lymperaiou
Giorgos Stamou
AAML
31
1
0
05 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
44
21
0
04 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjD
VLM
20
31
0
04 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
32
4
0
04 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
94
11
0
03 Mar 2023
Connecting Vision and Language with Video Localized Narratives
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
49
21
0
22 Feb 2023
Few-shot Multimodal Multitask Multilingual Learning
Aman Chadha
Vinija Jain
53
0
0
19 Feb 2023
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
Zhi Zhang
H. Yannakoudakis
Xiantong Zhen
Ekaterina Shutova
29
2
0
17 Feb 2023
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
32
29
0
16 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
26
7
0
16 Feb 2023
Towards Local Visual Modeling for Image Captioning
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Rongrong Ji
ViT
21
71
0
13 Feb 2023
Context Understanding in Computer Vision: A Survey
Xuan Wang
Zhigang Zhu
24
46
0
10 Feb 2023
Controlling for Stereotypes in Multimodal Language Model Evaluation
Manuj Malik
Richard Johansson
23
1
0
03 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
Yizhen Chen
Jie Wang
Lijian Lin
Zhongang Qi
Jin Ma
Ying Shan
VLM
33
18
0
30 Jan 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
12
1
0
28 Jan 2023
Implicit Shape Model Trees: Recognition of 3-D Indoor Scenes and Prediction of Object Poses for Mobile Robots
Pascal Meissner
Rüdiger Dillmann
3DPC
21
0
0
25 Jan 2023
OvarNet: Towards Open-vocabulary Object Attribute Recognition
Keyan Chen
Xiaolong Jiang
Yao Hu
Xu Tang
Yan Gao
Jianqi Chen
Weidi Xie
VLM
ObjD
40
40
0
23 Jan 2023
Effective End-to-End Vision Language Pretraining with Semantic Visual Loss
Xiaofeng Yang
Fayao Liu
Guosheng Lin
VLM
26
7
0
18 Jan 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation
Yuheng Li
Haotian Liu
Qingyang Wu
Fangzhou Mu
Jianwei Yang
Jianfeng Gao
Chunyuan Li
Yong Jae Lee
VLM
80
569
1
17 Jan 2023
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval
Yan Zhang
Zhong Ji
Dingrong Wang
Yanwei Pang
Xuelong Li
VLM
24
22
0
17 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Songlin Yang
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
35
36
0
12 Jan 2023
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLM
AI4CE
LRM
22
17
0
12 Jan 2023
Graph based Environment Representation for Vision-and-Language Navigation in Continuous Environments
Ting Wang
Zongkai Wu
Feiyu Yao
Donglin Wang
51
5
0
11 Jan 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
Xu Yang
Zhang Li
Haiyang Xu
Hanwang Zhang
Qinghao Ye
Chenliang Li
Ming Yan
Yu Zhang
Fei Huang
Songfang Huang
36
7
0
05 Jan 2023
PACO: Parts and Attributes of Common Objects
Vignesh Ramanathan
Anmol Kalia
Vladan Petrovic
Yiqian Wen
Baixue Zheng
...
Abhishek Kadian
Amir Mousavi
Yi-Zhe Song
Abhimanyu Dubey
D. Mahajan
VLM
30
95
0
04 Jan 2023
Skew Class-balanced Re-weighting for Unbiased Scene Graph Generation
Haeyong Kang
Chang D. Yoo
34
6
0
01 Jan 2023
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
31
29
0
28 Dec 2022
Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation
Gyan Tatiya
Jonathan M Francis
Luca Bondi
Ingrid Navarro
Eric Nyberg
Jivko Sinapov
Jean Oh
30
8
0
21 Dec 2022
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
32
110
0
21 Dec 2022
Position-guided Text Prompt for Vision-Language Pre-training
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
24
37
0
19 Dec 2022
Universal Object Detection with Large Vision Model
Feng-Huei Lin
Wenze Hu
Yaowei Wang
Yonghong Tian
Guangming Lu
Fanglin Chen
Yong-mei Xu
Xiaoyu Wang
VLM
ObjD
32
8
0
19 Dec 2022
Transferring General Multimodal Pretrained Models to Text Recognition
Junyang Lin
Xuancheng Ren
Yichang Zhang
Gao Liu
Peng Wang
An Yang
Chang Zhou
34
4
0
19 Dec 2022
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
26
4
0
16 Dec 2022
Previous
1
2
3
...
5
6
7
...
19
20
21
Next