Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,655 papers shown
Title
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
Sara Atito
Muhammad Awais
Ammarah Farooq
Zhenhua Feng
J. Kittler
56
17
0
30 Nov 2021
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
Stan Weixian Lei
Difei Gao
Yuxuan Wang
Dongxing Mao
Zihan Liang
L. Ran
Mike Zheng Shou
69
8
0
30 Nov 2021
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang
Zi-yi Liu
N. Zheng
87
14
0
29 Nov 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
122
197
0
29 Nov 2021
Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
108
33
0
26 Nov 2021
Scene Graph Generation with Geometric Context
Vishal Kumar
Albert Mundu
S. Singh
GNN
3DV
37
2
0
25 Nov 2021
Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark
Qian Yin
Qingyong Hu
Hao Liu
Feng Zhang
Yingqian Wang
Zaiping Lin
Wei An
Yulan Guo
113
92
0
25 Nov 2021
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
109
12
0
24 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
154
221
0
24 Nov 2021
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
127
70
0
24 Nov 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLM
VLM
173
249
0
24 Nov 2021
Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation
Yan Zhang
David W. Zhang
Simon Lacoste-Julien
Gertjan J. Burghouts
Cees G. M. Snoek
BDL
104
22
0
23 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
146
117
0
23 Nov 2021
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
185
907
0
22 Nov 2021
Class-agnostic Object Detection with Multi-modal Transformer
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
Rao Muhammad Anwer
Ming-Hsuan Yang
156
97
0
22 Nov 2021
Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture
Daria Bakshandaeva
Denis Dimitrov
V.Ya. Arkhipkin
Alex Shonenkov
M. Potanin
...
Mikhail Martynov
Anton Voronov
Vera Davydova
E. Tutubalina
Aleksandr Petiushko
110
0
0
22 Nov 2021
Medical Visual Question Answering: A Survey
Zhihong Lin
Donghao Zhang
Qingyi Tao
Danli Shi
Gholamreza Haffari
Qi Wu
M. He
Z. Ge
114
122
0
19 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
78
57
0
19 Nov 2021
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
M. Gao
Chen Xing
Juan Carlos Niebles
Junnan Li
Ran Xu
Wenhao Liu
Caiming Xiong
VLM
ObjD
104
86
0
18 Nov 2021
Learning to Compose Visual Relations
Nan Liu
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
CoGe
OCL
91
80
0
17 Nov 2021
Transparent Human Evaluation for Image Captioning
Jungo Kasai
Keisuke Sakaguchi
Lavinia Dunagan
Jacob Morrison
Ronan Le Bras
Yejin Choi
Noah A. Smith
82
49
0
17 Nov 2021
Achieving Human Parity on Visual Question Answering
Ming Yan
Haiyang Xu
Chenliang Li
Junfeng Tian
Bin Bi
...
Ji Zhang
Songfang Huang
Fei Huang
Luo Si
Rong Jin
63
13
0
17 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLM
CLIP
95
308
0
16 Nov 2021
Visual Intelligence through Human Interaction
Ranjay Krishna
Mitchell L. Gordon
Fei-Fei Li
Michael S. Bernstein
69
8
0
12 Nov 2021
Full Characterization of Adaptively Strong Majority Voting in Crowdsourcing
M. Boyarskaya
Panagiotis G. Ipeirotis
52
0
0
11 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
193
356
0
11 Nov 2021
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective
Zhengzhuo Xu
Zenghao Chai
Chun Yuan
121
53
0
06 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
104
381
0
03 Nov 2021
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLM
MLLM
MoE
104
560
0
03 Nov 2021
From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence
Nicholas Roy
Ingmar Posner
Timothy D. Barfoot
Philippe Beaudoin
Yoshua Bengio
...
S. Schaal
Gaurav Sukhatme
D. Thérien
Marc Toussaint
M. van de Panne
114
56
0
28 Oct 2021
Perceptual Score: What Data Modalities Does Your Model Perceive?
Itai Gat
Idan Schwartz
Alex Schwing
96
32
0
27 Oct 2021
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
A. Moudgil
Arjun Majumdar
Harsh Agrawal
Stefan Lee
Dhruv Batra
LM&Ro
84
61
0
27 Oct 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
164
206
0
25 Oct 2021
MIGS: Meta Image Generation from Scene Graphs
R. Rajagede
Sabrina Musatian
Helisa Dhamo
Nassir Navab
VLM
80
10
0
22 Oct 2021
Single-Modal Entropy based Active Learning for Visual Question Answering
Dong-Jin Kim
Jae-Won Cho
Jinsoo Choi
Yunjae Jung
In So Kweon
63
12
0
21 Oct 2021
Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization
A. Maharana
Joey Tianyi Zhou
84
63
0
21 Oct 2021
AI-Based Detection, Classification and Prediction/Prognosis in Medical Imaging: Towards Radiophenomics
F. Yousefirizi
P. Decazes
Amine Amyar
S. Ruan
Babak Saboury
Arman Rahmim
LM&MA
MedIm
87
45
0
20 Oct 2021
StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects
Weiyu Liu
Chris Paxton
Tucker Hermans
Dieter Fox
104
94
0
19 Oct 2021
Towards Optimal Correlational Object Search
Kaiyu Zheng
Rohan Chitnis
Yoonchang Sung
George Konidaris
Stefanie Tellex
77
22
0
19 Oct 2021
A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Yupan Huang
Bei Liu
Jianlong Fu
Yutong Lu
DiffM
65
6
0
19 Oct 2021
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang
Hongwei Xue
Bei Liu
Yutong Lu
79
59
0
19 Oct 2021
Towards Language-guided Visual Recognition via Dynamic Convolutions
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yongjian Wu
Yue Gao
Rongrong Ji
ObjD
98
19
0
17 Oct 2021
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
Woojeong Jin
Yu Cheng
Yelong Shen
Weizhu Chen
Xiang Ren
VLM
VPVLM
MLLM
117
138
0
16 Oct 2021
Self-Annotated Training for Controllable Image Captioning
Zhangzi Zhu
Tianlei Wang
Hong Qu
72
2
0
16 Oct 2021
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels
Mohit Sharma
Rajkumar Patra
Harshali Desai
Shruti Vyas
Yogesh S Rawat
R. Shah
VGen
NoLa
59
3
0
13 Oct 2021
Topic Scene Graph Generation by Attention Distillation from Caption
Wenbin Wang
R. Wang
X. Chen
DiffM
94
14
0
12 Oct 2021
Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking
Dirk Vath
Pascal Tilli
Ngoc Thang Vu
82
4
0
11 Oct 2021
Learning Single/Multi-Attribute of Object with Symmetry and Group
Yong-Lu Li
Yue Xu
Xinyu Xu
Xiaohan Mao
Cewu Lu
133
6
0
09 Oct 2021
Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic Content
Alan Lundgard
Arvind Satyanarayan
59
136
0
08 Oct 2021
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning
Ali Furkan Biten
L. G. I. Bigorda
Dimosthenis Karatzas
159
63
0
04 Oct 2021
Previous
1
2
3
...
14
15
16
...
32
33
34
Next