ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXivPDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,034 papers shown
Title
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image
  Captioning with Neural Scene Graph Generators
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
27
37
0
22 Sep 2019
Triplet-Aware Scene Graph Embeddings
Triplet-Aware Scene Graph Embeddings
Brigit Schroeder
Subarna Tripathi
Hanlin Tang
3DPC
33
16
0
19 Sep 2019
Scene Graph Parsing by Attention Graph
Scene Graph Parsing by Attention Graph
Martin Andrews
Yew Ken Chia
Sam Witteveen
GNN
30
11
0
13 Sep 2019
Specifying Object Attributes and Relations in Interactive Scene
  Generation
Specifying Object Attributes and Relations in Interactive Scene Generation
Oron Ashual
Lior Wolf
36
178
0
11 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through
  Entailed Question Generation
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
24
65
0
10 Sep 2019
Explainable Video Action Reasoning via Prior Knowledge and State
  Transitions
Explainable Video Action Reasoning via Prior Knowledge and State Transitions
Tao Zhuo
Zhiyong Cheng
Peng Zhang
Yongkang Wong
Mohan Kankanhalli
FAtt
33
60
0
28 Aug 2019
Situational Fusion of Visual Representation for Visual Navigation
Situational Fusion of Visual Representation for Visual Navigation
Bokui (William) Shen
Danfei Xu
Yuke Zhu
Leonidas J. Guibas
Fei-Fei Li
Silvio Savarese
SSL
27
62
0
24 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
58
1,650
0
22 Aug 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
242
320
0
21 Aug 2019
Phrase Localization Without Paired Training Examples
Phrase Localization Without Paired Training Examples
Josiah Wang
Lucia Specia
35
41
0
20 Aug 2019
Image Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and Style
Wei Sun
Tianfu Wu
27
140
0
20 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
93
2,453
0
20 Aug 2019
Proposal-free Temporal Moment Localization of a Natural-Language Query
  in Video using Guided Attention
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
F. Saleh
Hongdong Li
Stephen Gould
27
147
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
30
156
0
20 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information
  Bottleneck
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Shuang Ma
Daniel J. McDuff
Yale Song
25
22
0
19 Aug 2019
Attention on Attention for Image Captioning
Attention on Attention for Image Captioning
Lun Huang
Wenmin Wang
Jie Chen
Xiao-Yong Wei
24
823
0
19 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
80
895
0
16 Aug 2019
Unpaired Cross-lingual Image Caption Generation with Self-Supervised
  Rewards
Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song
Shizhe Chen
Yida Zhao
Qin Jin
SSL
26
40
0
15 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language
  Interactions
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang
Xing Xu
Yang Yang
Alan Hanjalic
Heng Tao Shen
Jingkuan Song
22
145
0
12 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
67
1,917
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
87
3,624
0
06 Aug 2019
Aligning Linguistic Words and Visual Semantic Units for Image Captioning
Aligning Linguistic Words and Visual Semantic Units for Image Captioning
Longteng Guo
Jing Liu
Jinhui Tang
Jiangwei Li
W. Luo
Hanqing Lu
25
102
0
06 Aug 2019
Cascaded Revision Network for Novel Object Captioning
Cascaded Revision Network for Novel Object Captioning
Qianyu Feng
Yu Wu
Hehe Fan
C. Yan
Yezhou Yang
29
35
0
06 Aug 2019
Convolutional Auto-encoding of Sentence Topics for Image Paragraph
  Generation
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
Jing Wang
Yingwei Pan
Ting Yao
Jinhui Tang
Tao Mei
VLM
BDL
DiffM
19
36
0
01 Aug 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question
  Answering
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
23
50
0
28 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
20
132
0
22 Jul 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine
  Translation
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
Shantipriya Parida
Ondrej Bojar
S. Dash
33
62
0
21 Jul 2019
CraftAssist: A Framework for Dialogue-enabled Interactive Agents
CraftAssist: A Framework for Dialogue-enabled Interactive Agents
Jonathan Gray
Kavya Srinet
Yacine Jernite
Haonan Yu
Zhuoyuan Chen
Demi Guo
Siddharth Goyal
C. L. Zitnick
Arthur Szlam
33
39
0
19 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
36
796
0
25 Jun 2019
Integrating Knowledge and Reasoning in Image Understanding
Integrating Knowledge and Reasoning in Image Understanding
Somak Aditya
Yezhou Yang
Chitta Baral
OCL
36
40
0
24 Jun 2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge
  2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019
Xiaohan Wang
Yu Wu
Linchao Zhu
Yi Yang
18
19
0
22 Jun 2019
The Limited Multi-Label Projection Layer
The Limited Multi-Label Projection Layer
Brandon Amos
V. Koltun
J. Zico Kolter
19
36
0
20 Jun 2019
Image Captioning with Integrated Bottom-Up and Multi-level Residual
  Top-Down Attention for Game Scene Understanding
Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding
Jian Zheng
S. Krishnamurthy
Ruxin Chen
Min-Hung Chen
Zhenhao Ge
Xiaohua Li
40
4
0
16 Jun 2019
Does Learning Require Memorization? A Short Tale about a Long Tail
Does Learning Require Memorization? A Short Tale about a Long Tail
Vitaly Feldman
TDI
49
481
0
12 Jun 2019
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Zhengjun Zha
Daqing Liu
Hanwang Zhang
Yongdong Zhang
Feng Wu
25
119
0
06 Jun 2019
Relational Reasoning using Prior Knowledge for Visual Captioning
Relational Reasoning using Prior Knowledge for Visual Captioning
Jingyi Hou
Xinxiao Wu
Yayun Qi
Wentian Zhao
Jiebo Luo
Yunde Jia
17
14
0
04 Jun 2019
Scene Text Visual Question Answering
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
36
343
0
31 May 2019
Contextual Translation Embedding for Visual Relationship Detection and
  Scene Graph Generation
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
29
14
0
28 May 2019
Self-Critical Reasoning for Robust Visual Question Answering
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OOD
NAI
32
159
0
24 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image
  Representations
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
34
81
0
15 May 2019
Language-Conditioned Graph Networks for Relational Reasoning
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
31
171
0
10 May 2019
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Yikang Li
Tao Ma
Yeqi Bai
Nan Duan
Sining Wei
Xiaogang Wang
28
93
0
05 May 2019
On Exploring Undetermined Relationships for Visual Relationship
  Detection
On Exploring Undetermined Relationships for Visual Relationship Detection
Yibing Zhan
Jun-chen Yu
Ting Yu
Dacheng Tao
31
82
0
05 May 2019
Scene Graph Prediction with Limited Labels
Scene Graph Prediction with Limited Labels
V. Chen
P. Varma
Ranjay Krishna
Michael S. Bernstein
Christopher Ré
Li Fei-Fei
16
86
0
25 Apr 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
31
227
0
25 Apr 2019
Deep Metric Learning Beyond Binary Supervision
Deep Metric Learning Beyond Binary Supervision
Sungyeon Kim
Minkyo Seo
Ivan Laptev
Minsu Cho
Suha Kwak
SSL
20
94
0
21 Apr 2019
Learning to Collocate Neural Modules for Image Captioning
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
25
77
0
18 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image,
  Multi-sentence Documents
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
31
30
0
16 Apr 2019
Previous
123...1718192021
Next