Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.07332
Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"
50 / 1,137 papers shown
Title
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
58
816
0
14 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
30
10
0
10 Jun 2021
PAM: Understanding Product Images in Cross Product Category Attribute Extraction
Rongmei Lin
Xiang He
J. Feng
Nasser Zalmout
Yan Liang
Li Xiong
Xin Luna Dong
36
35
0
08 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
13
188
0
06 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
33
372
0
04 Jun 2021
Linguistic Structures as Weak Supervision for Visual Scene Graph Generation
Keren Ye
Adriana Kovashka
29
52
0
28 May 2021
How saccadic vision might help with theinterpretability of deep networks
Iana Sereda
Grigory V. Osipov
FAtt
33
0
0
27 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
20
55
0
24 May 2021
Exemplar-Based Open-Set Panoptic Segmentation Network
Jaedong Hwang
Seoung Wug Oh
Joon-Young Lee
Bohyung Han
VLM
33
50
0
18 May 2021
Multi-Modal Image Captioning for the Visually Impaired
Hiba Ahsan
Nikita Bhalla
Daivat Bhatt
Kaivankumar Shah
25
20
0
17 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
29
140
0
17 May 2021
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
Amanpreet Singh
Guan Pang
Mandy Toh
Jing Huang
Wojciech Galuba
Tal Hassner
19
165
0
12 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
33
0
0
10 May 2021
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
D. Guo
Ruiying Lu
Bo Chen
Zequn Zeng
Mingyuan Zhou
VLM
25
9
0
10 May 2021
Exploring Explicit and Implicit Visual Relationships for Image Captioning
Zeliang Song
Xiaofei Zhou
21
7
0
06 May 2021
Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries
Qingyi Dong
Zhuowen Tu
Haofu Liao
Yuting Zhang
Vijay Mahadevan
Stefano Soatto
ViT
19
38
0
05 May 2021
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads
Chenyu Gao
Qi Zhu
Peng Wang
Qi Wu
18
2
0
30 Apr 2021
Segmentation-grounded Scene Graph Generation
Siddhesh Khandelwal
M. Suhail
Leonid Sigal
29
27
0
29 Apr 2021
REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter
Hanbo Zhang
Deyu Yang
Han Wang
Binglei Zhao
Xuguang Lan
Jishiyu Ding
Nanning Zheng
44
40
0
29 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
90
864
0
26 Apr 2021
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
42
206
0
26 Apr 2021
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
109
54
0
23 Apr 2021
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
Guanghui Xu
Shuaicheng Niu
Mingkui Tan
Yucheng Luo
Qing Du
Qi Wu
DiffM
22
56
0
23 Apr 2021
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong-jin Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
Chunyan Miao
Houqiang Li
30
41
0
19 Apr 2021
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Shir Gur
Natalia Neverova
C. Stauffer
Ser-Nam Lim
Douwe Kiela
A. Reiter
22
26
0
16 Apr 2021
Exploring Visual Engagement Signals for Representation Learning
Menglin Jia
Zuxuan Wu
A. Reiter
Claire Cardie
Serge Belongie
Ser-Nam Lim
21
13
0
15 Apr 2021
Zero-Shot Instance Segmentation
Ye Zheng
Jiahong Wu
Yongqiang Qin
Faen Zhang
Li Cui
ISeg
VLM
30
54
0
14 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
32
150
0
13 Apr 2021
StylePTB: A Compositional Benchmark for Fine-grained Controllable Text Style Transfer
Yiwei Lyu
Paul Pu Liang
Hai Pham
Eduard H. Hovy
Barnabas Poczos
Ruslan Salakhutdinov
Louis-Philippe Morency
27
41
0
12 Apr 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
Yuankai Qi
Zizheng Pan
Yicong Hong
Ming-Hsuan Yang
Anton Van Den Hengel
Qi Wu
LM&Ro
34
68
0
09 Apr 2021
SiT: Self-supervised vIsion Transformer
Sara Atito Ali Ahmed
Muhammad Awais
J. Kittler
ViT
39
139
0
08 Apr 2021
Automatic Generation of Descriptive Titles for Video Clips Using Deep Learning
Soheyla Amirian
Khaled Rasheed
T. Taha
H. Arabnia
VLM
VGen
19
23
0
07 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
51
271
0
07 Apr 2021
Multimodal Continuous Visual Attention Mechanisms
António Farinhas
André F. T. Martins
P. Aguiar
22
7
0
07 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
39
97
0
05 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
19
26
0
02 Apr 2021
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
11
50
0
01 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
31
89
0
01 Apr 2021
Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation
Rongjie Li
Songyang Zhang
Bo Wan
Xuming He
187
214
0
01 Apr 2021
Benchmarking Representation Learning for Natural World Image Collections
Grant Van Horn
Elijah Cole
Sara Beery
Kimberly Wilber
Serge J. Belongie
Oisin Mac Aodha
SSL
VLM
26
165
0
30 Mar 2021
Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays
Xiaosong Wang
Ziyue Xu
Leo K. Tam
Dong Yang
Daguang Xu
ViT
MedIm
25
23
0
30 Mar 2021
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
Yongfei Liu
Bo Wan
Lin Ma
Xuming He
ObjD
24
56
0
24 Mar 2021
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Long Chen
Zhihong Jiang
Jun Xiao
Wei Liu
30
74
0
22 Mar 2021
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
Yonatan Bitton
Gabriel Stanovsky
Roy Schwartz
Michael Elhadad
CoGe
25
33
0
17 Mar 2021
A Comprehensive Survey of Scene Graphs: Generation and Application
Xiaojun Chang
Pengzhen Ren
Pengfei Xu
Zhihui Li
Xiaojiang Chen
Alexander G. Hauptmann
3DV
33
222
0
17 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
24
56
0
16 Mar 2021
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
38
82
0
16 Mar 2021
Knowledge driven Description Synthesis for Floor Plan Interpretation
Shreya Goyal
Chiranjoy Chattopadhyay
Gaurav Bhatnagar
3DV
23
13
0
15 Mar 2021
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images
Haolin Liu
Anran Lin
Xiaoguang Han
Lei Yang
Yizhou Yu
Shuguang Cui
27
40
0
14 Mar 2021
SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
Chenliang Li
Ming Yan
Haiyang Xu
Fuli Luo
Wei Wang
Bin Bi
Songfang Huang
VLM
34
36
0
14 Mar 2021
Previous
1
2
3
...
14
15
16
...
21
22
23
Next