ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXivPDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,166 papers shown
Title
Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable
  and Advisable AI Systems
Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems
Subbarao Kambhampati
S. Sreedharan
Mudit Verma
Yantian Zha
L. Guan
52
47
0
21 Sep 2021
Multi-Agent Embodied Visual Semantic Navigation with Scene Prior
  Knowledge
Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge
Xinzhu Liu
Di Guo
Huaping Liu
F. Sun
EgoV
20
23
0
20 Sep 2021
Screen Parsing: Towards Reverse Engineering of UI Models from
  Screenshots
Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots
Jason Wu
Xiaoyi Zhang
Jeffrey Nichols
Jeffrey P. Bigham
3DV
165
71
0
17 Sep 2021
GoG: Relation-aware Graph-over-Graph Network for Visual Dialog
GoG: Relation-aware Graph-over-Graph Network for Visual Dialog
Feilong Chen
Xiuyi Chen
Fandong Meng
Peng Li
Jie Zhou
76
34
0
17 Sep 2021
Cross Modification Attention Based Deliberation Model for Image
  Captioning
Cross Modification Attention Based Deliberation Model for Image Captioning
Zheng Lian
Yanan Zhang
Haichang Li
Rui Wang
Xiaohui Hu
24
4
0
17 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based
  Visual Question Answering
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
33
59
0
15 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the
  Dataset into Explicit Training Examples for Visual Question Answering
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
61
20
0
13 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
42
56
0
13 Sep 2021
Explain Me the Painting: Multi-Topic Knowledgeable Art Description
  Generation
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation
Zechen Bai
Yuta Nakashima
Noa Garcia
68
43
0
13 Sep 2021
BGT-Net: Bidirectional GRU Transformer Network for Scene Graph
  Generation
BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation
Naina Dhingra
Florian Ritter
A. Kunz
73
37
0
11 Sep 2021
Partially-Supervised Novel Object Captioning Leveraging Context from
  Paired Data
Partially-Supervised Novel Object Captioning Leveraging Context from Paired Data
Shashank Bujimalla
Mahesh Subedar
Omesh Tickoo
40
1
0
10 Sep 2021
Panoptic Narrative Grounding
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
90
22
0
10 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
21
2
0
09 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial
  Multi-modal Pretraining
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
35
37
0
09 Sep 2021
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense
  in Text Generation Models
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
Steven Y. Feng
Kevin Lu
Zhuofu Tao
Malihe Alikhani
Teruko Mitamura
Eduard H. Hovy
Varun Gangal
LRM
40
13
0
08 Sep 2021
YouRefIt: Embodied Reference Understanding with Language and Gesture
YouRefIt: Embodied Reference Understanding with Language and Gesture
Yixin Chen
Qing Li
Deqian Kong
Yik Lun Kei
Song-Chun Zhu
Tao Gao
Yixin Zhu
Siyuan Huang
LM&Ro
37
42
0
08 Sep 2021
Learning to Generate Scene Graph from Natural Language Supervision
Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong
Jing Shi
Jianwei Yang
Chenliang Xu
Yin Li
SSL
42
77
0
06 Sep 2021
GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph
GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph
Zhixuan Zhang
Chi Zhang
Zhenning Niu
Le Wang
Yuehu Liu
26
7
0
06 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question
  Answering
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
30
18
0
04 Sep 2021
N24News: A New Dataset for Multimodal News Classification
N24News: A New Dataset for Multimodal News Classification
Zhen Wang
Xu Shan
Xiangxie Zhang
Jie Yang
VLM
26
33
0
30 Aug 2021
From General to Specific: Informative Scene Graph Generation via Balance
  Adjustment
From General to Specific: Informative Scene Graph Generation via Balance Adjustment
Yuyu Guo
Lianli Gao
Xuanhan Wang
Yuxuan Hu
Xing Xu
Xu Lu
Heng Tao Shen
Jingkuan Song
66
84
0
30 Aug 2021
Zero-shot Natural Language Video Localization
Zero-shot Natural Language Video Localization
Jinwoo Nam
Daechul Ahn
Dongyeop Kang
S. Ha
Jonghyun Choi
94
43
0
29 Aug 2021
Few-shot Visual Relationship Co-localization
Few-shot Visual Relationship Co-localization
Revant Teotia
Vaibhav Mishra
Mayank Maheshwari
Anand Mishra
20
1
0
26 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
51
782
0
24 Aug 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
24
35
0
24 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
36
10
0
21 Aug 2021
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Xiaopeng Lu
Zhenhua Fan
Yansen Wang
Jean Oh
Carolyn Rose
32
27
0
20 Aug 2021
Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using
  Scene Graphs
Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs
Helisa Dhamo
Fabian Manhardt
Nassir Navab
F. Tombari
3DV
14
67
0
19 Aug 2021
Exploiting Scene Graphs for Human-Object Interaction Detection
Exploiting Scene Graphs for Human-Object Interaction Detection
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
30
27
0
19 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and
  Intra-modal Knowledge Integration
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
27
53
0
16 Aug 2021
Unconditional Scene Graph Generation
Unconditional Scene Graph Generation
Sarthak Garg
Helisa Dhamo
Azade Farshad
Sabrina Musatian
Nassir Navab
F. Tombari
18
23
0
12 Aug 2021
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual,
  Interactive, and Ecological Environments
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments
S. Srivastava
Chengshu Li
Michael Lingelbach
Roberto Martín-Martín
Fei Xia
...
Chenxi Liu
Silvio Savarese
H. Gweon
Jiajun Wu
Li Fei-Fei
LM&Ro
156
159
0
06 Aug 2021
Dual Graph Convolutional Networks with Transformer and Curriculum
  Learning for Image Captioning
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
Xinzhi Dong
Chengjiang Long
Wenju Xu
Chunxia Xiao
ViT
81
66
0
05 Aug 2021
Hierarchical Representations and Explicit Memory: Learning Effective
  Navigation Policies on 3D Scene Graphs using Graph Neural Networks
Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks
Zachary Ravichandran
Lisa Peng
Nathan Hughes
J. D. Griffith
Luca Carlone
42
69
0
02 Aug 2021
Chest ImaGenome Dataset for Clinical Reasoning
Chest ImaGenome Dataset for Clinical Reasoning
Joy T. Wu
Nkechinyere N. Agu
Ismini Lourentzou
Arjun Sharma
J. Paguio
...
William Mitchell
Satyananda Kashyap
Andrea Giovannini
Leo Anthony Celi
Mehdi Moradi
21
65
0
31 Jul 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
  via Cross-modal Pretraining
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining
Xunlin Zhan
Yangxin Wu
Xiao Dong
Yunchao Wei
Minlong Lu
Yichi Zhang
Hang Xu
Xiaodan Liang
ViT
34
64
0
30 Jul 2021
A Thorough Review on Recent Deep Learning Methodologies for Image
  Captioning
A Thorough Review on Recent Deep Learning Methodologies for Image Captioning
Ahmed Elhagry
Karima Kadaoui
VLM
33
17
0
28 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
22
122
0
26 Jul 2021
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text
  Recognition
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
A. Bhunia
Aneeshan Sain
Amandeep Kumar
S. Ghose
Pinaki Nath Chowdhury
Yi-Zhe Song
32
56
0
26 Jul 2021
Query2Label: A Simple Transformer Way to Multi-Label Classification
Query2Label: A Simple Transformer Way to Multi-Label Classification
Shilong Liu
Lei Zhang
Xiao Yang
Hang Su
Jun Zhu
26
187
0
22 Jul 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Chenyu You
Caiming Xiong
Guosheng Lin
FaML
83
1,893
0
16 Jul 2021
Neighbor-view Enhanced Model for Vision and Language Navigation
Neighbor-view Enhanced Model for Vision and Language Navigation
Dongyan An
Yuankai Qi
Yan Huang
Qi Wu
Liang Wang
Tieniu Tan
LM&Ro
24
67
0
15 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
69
256
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
202
406
0
13 Jul 2021
Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge
  Integration
Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration
Xuan Kan
Hejie Cui
Carl Yang
78
40
0
11 Jul 2021
Predicate correlation learning for scene graph generation
Predicate correlation learning for scene graph generation
Lei Tao
Li Mi
Nannan Li
Xianhang Cheng
Yaosi Hu
Zhenzhong Chen
45
17
0
06 Jul 2021
Recovering the Unbiased Scene Graphs from the Biased Ones
Recovering the Unbiased Scene Graphs from the Biased Ones
Meng-Jiun Chiou
Henghui Ding
Hanshu Yan
Changhu Wang
Roger Zimmermann
Jiashi Feng
55
113
0
05 Jul 2021
Web-Scale Generic Object Detection at Microsoft Bing
Web-Scale Generic Object Detection at Microsoft Bing
S. Chen
Saurajit Mukherjee
Unmesh Phadke
Tingting Wang
Junwon Park
Ravi Theja Yada
ObjD
VLM
19
0
0
05 Jul 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and
  Generation
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Jing Liu
Xinxin Zhu
Fei Liu
Longteng Guo
Zijia Zhao
...
Weining Wang
Hanqing Lu
Shiyu Zhou
Jiajun Zhang
Jinqiao Wang
39
37
0
01 Jul 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded
  Compositional Visual Question Answering based on Scene Graphs
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
30
2
0
28 Jun 2021
Previous
123...131415...222324
Next