ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXiv (abs)PDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,650 papers shown
Title
Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation
Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation
Wenbin Wang
Ruiping Wang
Shiguang Shan
Xilin Chen
3DH
102
53
0
17 Jul 2020
Detecting Human-Object Interactions with Action Co-occurrence Priors
Detecting Human-Object Interactions with Action Co-occurrence Priors
Dong-Jin Kim
Xiao Sun
Jinsoo Choi
Stephen Lin
In So Kweon
76
125
0
17 Jul 2020
RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval
RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval
Hung-Yu Tseng
Hsin-Ying Lee
Lu Jiang
Ming-Hsuan Yang
Weilong Yang
DiffM3DV
159
54
0
16 Jul 2020
Explore and Explain: Self-supervised Navigation and Recounting
Explore and Explain: Self-supervised Navigation and Recounting
Roberto Bigazzi
Federico Landi
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
EgoVLM&Ro
78
17
0
14 Jul 2020
Reducing Language Biases in Visual Question Answering with
  Visually-Grounded Question Encoder
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
98
79
0
13 Jul 2020
Generative Compositional Augmentations for Scene Graph Prediction
Generative Compositional Augmentations for Scene Graph Prediction
Boris Knyazev
H. D. Vries
Cătălina Cangea
Graham W. Taylor
Aaron Courville
Eugene Belilovsky
106
26
0
11 Jul 2020
Image Captioning with Compositional Neural Module Networks
Image Captioning with Compositional Neural Module Networks
Junjiao Tian
Jean Oh
44
11
0
10 Jul 2020
IQ-VQA: Intelligent Visual Question Answering
IQ-VQA: Intelligent Visual Question Answering
Vatsal Goel
Mohit Chandak
A. Anand
Prithwijit Guha
64
5
0
08 Jul 2020
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal
  Shuffled Transformers
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
Shijie Geng
Peng Gao
Moitreya Chatterjee
Chiori Hori
Jonathan Le Roux
Yongfeng Zhang
Hongsheng Li
A. Cherian
101
11
0
08 Jul 2020
Modality Shifting Attention Network for Multi-modal Video Question
  Answering
Modality Shifting Attention Network for Multi-modal Video Question Answering
Junyeong Kim
Minuk Ma
T. Pham
Kyungsu Kim
Chang D. Yoo
84
72
0
04 Jul 2020
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge
  Distillation
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation
Liwei Wang
Jing-ling Huang
Yin Li
Kun Xu
Zhengyuan Yang
Dong Yu
ObjD
74
84
0
03 Jul 2020
DocVQA: A Dataset for VQA on Document Images
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
169
748
0
01 Jul 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through
  Scene Graph
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
128
382
0
30 Jun 2020
Learning Physical Graph Representations from Visual Scenes
Learning Physical Graph Representations from Visual Scenes
Daniel M. Bear
Chaofei Fan
Damian Mrowca
Yunzhu Li
S. Alter
...
Jeremy Schwartz
Li Fei-Fei
Jiajun Wu
J. Tenenbaum
Daniel L. K. Yamins
SSLGNNSSegAI4CE
108
79
0
22 Jun 2020
Improving Image Captioning with Better Use of Captions
Improving Image Captioning with Better Use of Captions
Zhan Shi
Xu Zhou
Xipeng Qiu
Xiao-Dan Zhu
66
128
0
21 Jun 2020
Learning Visual Commonsense for Robust Scene Graph Generation
Learning Visual Commonsense for Robust Scene Graph Generation
Alireza Zareian
Zhecan Wang
Haoxuan You
Shih-Fu Chang
102
311
0
17 Jun 2020
Modeling Graph Structure via Relative Position for Text Generation from
  Knowledge Graphs
Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs
Martin Schmitt
Leonardo F. R. Ribeiro
Philipp Dufter
Iryna Gurevych
Hinrich Schütze
GNN
50
8
0
16 Jun 2020
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual
  Question Answering
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Zihao Zhu
Jiahao Yu
Yujing Wang
Yajing Sun
Yue Hu
Qi Wu
103
129
0
16 Jun 2020
Exploiting Visual Semantic Reasoning for Video-Text Retrieval
Exploiting Visual Semantic Reasoning for Video-Text Retrieval
Zerun Feng
Zhimin Zeng
Caili Guo
Zheng Li
79
36
0
16 Jun 2020
Generative 3D Part Assembly via Dynamic Graph Learning
Generative 3D Part Assembly via Dynamic Graph Learning
Jialei Huang
Guanqi Zhan
Qingnan Fan
Kaichun Mo
Lin Shao
Baoquan Chen
Leonidas Guibas
Hao Dong
126
89
0
14 Jun 2020
Learning from the Scene and Borrowing from the Rich: Tackling the Long
  Tail in Scene Graph Generation
Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation
Tao He
Lianli Gao
Jingkuan Song
Jianfei Cai
Yuan-Fang Li
80
31
0
13 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSLVLM
173
437
0
11 Jun 2020
Exploring Weaknesses of VQA Models through Attribution Driven Insights
Exploring Weaknesses of VQA Models through Attribution Driven Insights
Shaunak Halbe
37
2
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjDVLM
133
501
0
11 Jun 2020
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
OOD
88
90
0
09 Jun 2020
Counterfactual VQA: A Cause-Effect Look at Language Bias
Counterfactual VQA: A Cause-Effect Look at Language Bias
Yulei Niu
Kaihua Tang
Hanwang Zhang
Zhiwu Lu
Xiansheng Hua
Ji-Rong Wen
CML
147
403
0
08 Jun 2020
Pick-Object-Attack: Type-Specific Adversarial Attack for Object
  Detection
Pick-Object-Attack: Type-Specific Adversarial Attack for Object Detection
Omid Mohamad Nezami
Akshay Chaturvedi
Mark Dras
Utpal Garain
AAMLObjD
61
19
0
05 Jun 2020
Give Me Something to Eat: Referring Expression Comprehension with
  Commonsense Knowledge
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Peng Wang
Dongyang Liu
Hui Li
Qi Wu
ObjD
70
19
0
02 Jun 2020
Structured Multimodal Attentions for TextVQA
Structured Multimodal Attentions for TextVQA
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
Anton Van Den Hengel
Qi Wu
99
60
0
01 Jun 2020
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal
  Retrieval
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval
D. Gao
Linbo Jin
Ben Chen
Minghui Qiu
Peng Li
Yi Wei
Yitao Hu
Haozhe Jasper Wang
OOD
84
134
0
20 May 2020
Graph Density-Aware Losses for Novel Compositions in Scene Graph
  Generation
Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation
Boris Knyazev
H. D. Vries
Cătălina Cangea
Graham W. Taylor
Aaron Courville
Eugene Belilovsky
74
56
0
17 May 2020
Visual Relationship Detection using Scene Graphs: A Survey
Visual Relationship Detection using Scene Graphs: A Survey
Aniket Agarwal
Ayush Mangal
Vipul
GNN
70
21
0
16 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained
  Vision-and-Language Models
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
115
130
0
15 May 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
88
36
0
12 May 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
109
612
0
10 May 2020
History for Visual Dialog: Do we really need it?
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
38
71
0
08 May 2020
Diagnosing the Environment Bias in Vision-and-Language Navigation
Diagnosing the Environment Bias in Vision-and-Language Navigation
Yubo Zhang
Hao Tan
Joey Tianyi Zhou
73
57
0
06 May 2020
What are the Goals of Distributional Semantics?
What are the Goals of Distributional Semantics?
Guy Edward Toh Emerson
91
26
0
06 May 2020
Words aren't enough, their order matters: On the Robustness of Grounding
  Visual Referring Expressions
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions
Arjun Reddy Akula
Spandana Gella
Yaser Al-Onaizan
Song-Chun Zhu
Siva Reddy
ObjD
69
52
0
04 May 2020
Visually Grounded Continual Learning of Compositional Phrases
Visually Grounded Continual Learning of Compositional Phrases
Xisen Jin
Junyi Du
Arka Sadhu
Ram Nevatia
Xiang Ren
CLL
61
4
0
02 May 2020
Obtaining Faithful Interpretations from Compositional Neural Networks
Obtaining Faithful Interpretations from Compositional Neural Networks
Sanjay Subramanian
Ben Bogin
Nitish Gupta
Tomer Wolfson
Sameer Singh
Jonathan Berant
Matt Gardner
75
42
0
02 May 2020
Probing Contextual Language Models for Common Ground with Visual
  Representations
Probing Contextual Language Models for Common Ground with Visual Representations
Gabriel Ilharco
Rowan Zellers
Ali Farhadi
Hannaneh Hajishirzi
118
14
0
01 May 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the
  Web
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra
LM&Ro
196
236
0
30 Apr 2020
Image Captioning through Image Transformer
Image Captioning through Image Transformer
Sen He
Wentong Liao
Hamed R. Tavakoli
M. Yang
Bodo Rosenhahn
N. Pugeault
ViT
95
94
0
29 Apr 2020
Unifying Neural Learning and Symbolic Reasoning for Spinal Medical
  Report Generation
Unifying Neural Learning and Symbolic Reasoning for Spinal Medical Report Generation
Zhongyi Han
B. Wei
Yilong Yin
Shuo Li
MedIm
59
42
0
28 Apr 2020
VD-BERT: A Unified Vision and Dialog Transformer with BERT
VD-BERT: A Unified Vision and Dialog Transformer with BERT
Yue Wang
Shafiq Joty
Michael R. Lyu
Irwin King
Caiming Xiong
Guosheng Lin
114
104
0
28 Apr 2020
Fashionpedia: Ontology, Segmentation, and an Attribute Localization
  Dataset
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
Menglin Jia
Mengyun Shi
Mikhail Sirotenko
Huayu Chen
Claire Cardie
B. Hariharan
Hartwig Adam
Serge J. Belongie
93
97
0
26 Apr 2020
Deep Multimodal Neural Architecture Search
Deep Multimodal Neural Architecture Search
Zhou Yu
Yuhao Cui
Jun-chen Yu
Meng Wang
Dacheng Tao
Qi Tian
70
100
0
25 Apr 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
71
23
0
24 Apr 2020
Experience Grounds Language
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
124
361
0
21 Apr 2020
Previous
123...222324...313233
Next