ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXiv (abs)PDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,650 papers shown
Title
Transformer Reasoning Network for Image-Text Matching and Retrieval
Transformer Reasoning Network for Image-Text Matching and Retrieval
Nicola Messina
Fabrizio Falchi
Andrea Esuli
Giuseppe Amato
ViT
68
58
0
20 Apr 2020
Graph-Structured Referring Expression Reasoning in The Wild
Graph-Structured Referring Expression Reasoning in The Wild
Sibei Yang
Guanbin Li
Yizhou Yu
NAI
74
95
0
19 Apr 2020
Are we pretraining it right? Digging deeper into visio-linguistic
  pretraining
Are we pretraining it right? Digging deeper into visio-linguistic pretraining
Amanpreet Singh
Vedanuj Goswami
Devi Parikh
VLM
78
48
0
19 Apr 2020
CPARR: Category-based Proposal Analysis for Referring Relationships
CPARR: Category-based Proposal Analysis for Referring Relationships
Chuanzi He
Haidong Zhu
Jiyang Gao
Kan Chen
Ram Nevatia
132
7
0
17 Apr 2020
REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets
REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets
Angelina Wang
Alexander Liu
Ryan Zhang
Anat Kleiman
Leslie Kim
Dora Zhao
Iroha Shirai
Arvind Narayanan
Olga Russakovsky
89
191
0
16 Apr 2020
Relation Transformer Network
Relation Transformer Network
Rajat Koner
Poulami Sinhamahapatra
Volker Tresp
ViT
107
33
0
13 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
204
1,954
0
13 Apr 2020
A Survey of Single-Scene Video Anomaly Detection
A Survey of Single-Scene Video Anomaly Detection
B. Ramachandra
Michael J. Jones
Ranga Raju Vatsavai
AI4TS
96
180
0
13 Apr 2020
SpatialSim: Recognizing Spatial Configurations of Objects with Graph
  Neural Networks
SpatialSim: Recognizing Spatial Configurations of Objects with Graph Neural Networks
Laetitia Teodorescu
Katja Hofmann
Pierre-Yves Oudeyer
58
1
0
09 Apr 2020
Learning to Scale Multilingual Representations for Vision-Language Tasks
Learning to Scale Multilingual Representations for Vision-Language Tasks
Andrea Burns
Donghyun Kim
Derry Wijaya
Kate Saenko
Bryan A. Plummer
50
35
0
09 Apr 2020
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions
Johanna Wald
Helisa Dhamo
Nassir Navab
Federico Tombari
3DV3DPC
75
220
0
08 Apr 2020
Understanding Knowledge Gaps in Visual Question Answering: Implications
  for Gap Identification and Testing
Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing
Goonmeet Bajaj
Bortik Bandyopadhyay
Daniela Schmidt
Pranav Maneriker
Christopher Myers
Srinivasan Parthasarathy
35
2
0
08 Apr 2020
Context-Aware Group Captioning via Self-Attention and Contrastive
  Features
Context-Aware Group Captioning via Self-Attention and Contrastive Features
Zhuowan Li
Quan Hung Tran
Long Mai
Zhe Lin
Alan Yuille
VLM
81
44
0
07 Apr 2020
Semantic Image Manipulation Using Scene Graphs
Semantic Image Manipulation Using Scene Graphs
Helisa Dhamo
Azade Farshad
Iro Laina
Nassir Navab
Gregory Hager
Federico Tombari
Christian Rupprecht
113
121
0
07 Apr 2020
Optimistic Agent: Accurate Graph-Based Value Estimation for More
  Successful Visual Navigation
Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation
M. Moghaddam
Qi Wu
Ehsan Abbasnejad
Javen Qinfeng Shi
64
4
0
07 Apr 2020
SHOP-VRB: A Visual Reasoning Benchmark for Object Perception
SHOP-VRB: A Visual Reasoning Benchmark for Object Perception
Michal Nazarczuk
K. Mikolajczyk
72
21
0
06 Apr 2020
B-SCST: Bayesian Self-Critical Sequence Training for Image Captioning
B-SCST: Bayesian Self-Critical Sequence Training for Image Captioning
Shashank Bujimalla
Mahesh Subedar
Omesh Tickoo
BDLUQCV
25
10
0
06 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal
  Transformers
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
194
440
0
02 Apr 2020
Symmetry and Group in Attribute-Object Compositions
Symmetry and Group in Attribute-Object Compositions
Yong-Lu Li
Yue Xu
Xiaohan Mao
Cewu Lu
97
119
0
01 Apr 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
More Grounded Image Captioning by Distilling Image-Text Matching Model
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
90
126
0
01 Apr 2020
Graph Structured Network for Image-Text Matching
Graph Structured Network for Image-Text Matching
Chunxiao Liu
Zhendong Mao
Tianzhu Zhang
Hongtao Xie
Bin Wang
Yongdong Zhang
84
239
0
01 Apr 2020
X-Linear Attention Networks for Image Captioning
X-Linear Attention Networks for Image Captioning
Yingwei Pan
Ting Yao
Yehao Li
Tao Mei
134
519
0
31 Mar 2020
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Boxiao Pan
Haoye Cai
De-An Huang
Kuan-Hui Lee
Adrien Gaidon
Ehsan Adeli
Juan Carlos Niebles
79
236
0
31 Mar 2020
GPS-Net: Graph Property Sensing Network for Scene Graph Generation
GPS-Net: Graph Property Sensing Network for Scene Graph Generation
Xin Lin
Changxing Ding
Jinquan Zeng
Dacheng Tao
128
284
0
29 Mar 2020
Grounded Situation Recognition
Grounded Situation Recognition
Sarah M Pratt
Mark Yatskar
Luca Weihs
Ali Farhadi
Aniruddha Kembhavi
99
112
0
26 Mar 2020
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
J. Liu
Wenhu Chen
Yu Cheng
Zhe Gan
Licheng Yu
Yiming Yang
Jingjing Liu
MLLMVGen
102
70
0
25 Mar 2020
Learning Layout and Style Reconfigurable GANs for Controllable Image
  Synthesis
Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis
Wei Sun
Tianfu Wu
97
84
0
25 Mar 2020
Video Object Grounding using Semantic Roles in Language Description
Video Object Grounding using Semantic Roles in Language Description
Arka Sadhu
Kan Chen
Ram Nevatia
143
48
0
24 Mar 2020
Learning Object Permanence from Video
Learning Object Permanence from Video
Aviv Shamsian
Ofri Kleinfeld
Amir Globerson
Gal Chechik
SSL
138
32
0
23 Mar 2020
Visual Question Answering for Cultural Heritage
Visual Question Answering for Cultural Heritage
P. Bongini
Federico Becattini
Andrew D. Bagdanov
A. Bimbo
479
24
0
22 Mar 2020
Affinity Graph Supervision for Visual Recognition
Affinity Graph Supervision for Visual Recognition
Chu Wang
Babak Samari
Vladimir G. Kim
S. Chaudhuri
Kaleem Siddiqi
GNN
57
8
0
19 Mar 2020
Object-Centric Image Generation from Layouts
Object-Centric Image Generation from Layouts
Tristan Sylvain
Pengchuan Zhang
Yoshua Bengio
R. Devon Hjelm
Shikhar Sharma
EGVMOCL
156
102
0
16 Mar 2020
Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge
  into Deep Neural Networks
Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge into Deep Neural Networks
Karan Sikka
Andrew Silberfarb
John Byrnes
Indranil Sur
Edmond Chow
Ajay Divakaran
R. Rohwer
NAI
75
11
0
16 Mar 2020
Learning hierarchical relationships for object-goal navigation
Learning hierarchical relationships for object-goal navigation
Yiding Qiu
Anwesan Pal
H. Christensen
98
8
0
15 Mar 2020
Deconfounded Image Captioning: A Causal Retrospect
Deconfounded Image Captioning: A Causal Retrospect
Xu Yang
Hanwang Zhang
Jianfei Cai
CML
79
126
0
09 Mar 2020
A Study on Multimodal and Interactive Explanations for Visual Question
  Answering
A Study on Multimodal and Interactive Explanations for Visual Question Answering
Kamran Alipour
J. Schulze
Yi Yao
Avi Ziskind
Giedrius Burachas
64
27
0
01 Mar 2020
Cops-Ref: A new Dataset and Task on Compositional Referring Expression
  Comprehension
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension
Zhenfang Chen
Peng Wang
Lin Ma
Kwan-Yee K. Wong
Qi Wu
ObjD
125
68
0
01 Mar 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with
  Abstract Scene Graphs
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen
Qin Jin
Peng Wang
Qi Wu
DiffM
131
219
0
01 Mar 2020
Visual Commonsense R-CNN
Visual Commonsense R-CNN
Tan Wang
Jianqiang Huang
Hanwang Zhang
Qianru Sun
SSLObjDCML
86
252
0
27 Feb 2020
Unbiased Scene Graph Generation from Biased Training
Unbiased Scene Graph Generation from Biased Training
Kaihua Tang
Yulei Niu
Jianqiang Huang
Jiaxin Shi
Hanwang Zhang
CML
98
703
0
27 Feb 2020
Unshuffling Data for Improved Generalization
Unshuffling Data for Improved Generalization
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
OOD
77
78
0
27 Feb 2020
Analysis of diversity-accuracy tradeoff in image captioning
Analysis of diversity-accuracy tradeoff in image captioning
Ruotian Luo
Gregory Shakhnarovich
65
13
0
27 Feb 2020
From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation
  (VIN)
From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN)
Xin Ye
Yezhou Yang
SSL
114
16
0
26 Feb 2020
On the General Value of Evidence, and Bilingual Scene-Text Visual
  Question Answering
On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering
Xinyu Wang
Yuliang Liu
Chunhua Shen
Chun Chet Ng
Canjie Luo
Lianwen Jin
C. Chan
Anton Van Den Hengel
Liangwei Wang
101
97
0
24 Feb 2020
Captioning Images Taken by People Who Are Blind
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
105
184
0
20 Feb 2020
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
Hang Xu
Linpu Fang
Xiaodan Liang
Wenxiong Kang
Zhenguo Li
ObjD
63
23
0
18 Feb 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic
  Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO
  Framework
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
116
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image
  Captioning With R-CNN Feature Distribution Composition (FDC)
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
52
17
0
15 Feb 2020
3D Dynamic Scene Graphs: Actionable Spatial Perception with Places,
  Objects, and Humans
3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans
Antoni Rosinol
Arjun Gupta
Marcus Abate
Jingang Shi
Luca Carlone
96
196
0
15 Feb 2020
Sparse and Structured Visual Attention
Sparse and Structured Visual Attention
Pedro Henrique Martins
S. Becker
Zita Marinho
Michael Arens
78
8
0
13 Feb 2020
Previous
123...232425...313233
Next