ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXiv (abs)PDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,650 papers shown
Title
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSLVLMMLLM
303
907
0
16 Aug 2019
Unpaired Cross-lingual Image Caption Generation with Self-Supervised
  Rewards
Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song
Shizhe Chen
Yida Zhao
Qin Jin
SSL
52
41
0
15 Aug 2019
3-D Scene Graph: A Sparse and Semantic Representation of Physical
  Environments for Intelligent Agents
3-D Scene Graph: A Sparse and Semantic Representation of Physical Environments for Intelligent Agents
Ue-Hwan Kim
Jin-Man Park
Taek-jin Song
Jong-hwan Kim
3DV
81
108
0
14 Aug 2019
Semi-Supervised Learning using Differentiable Reasoning
Semi-Supervised Learning using Differentiable Reasoning
Emile van Krieken
Erman Acar
F. V. Harmelen
DRL
67
21
0
13 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language
  Interactions
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
107
38
0
12 Aug 2019
Who, Where, and What to Wear? Extracting Fashion Knowledge from Social
  Media
Who, Where, and What to Wear? Extracting Fashion Knowledge from Social Media
Yunshan Ma
Xun Yang
Lizi Liao
Yixin Cao
Tat-Seng Chua
114
49
0
12 Aug 2019
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang
Xing Xu
Yang Yang
Alan Hanjalic
Heng Tao Shen
Jingkuan Song
58
149
0
12 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
255
1,975
0
09 Aug 2019
CRIC: A VQA Dataset for Compositional Reasoning on Vision and
  Commonsense
CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense
Difei Gao
Ruiping Wang
Shiguang Shan
Xilin Chen
CoGeLRM
129
28
0
08 Aug 2019
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial
  Relation Recognition
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition
Kaiyu Yang
Olga Russakovsky
Jia Deng
3DPC
89
63
0
07 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSLVLM
323
3,716
0
06 Aug 2019
Aligning Linguistic Words and Visual Semantic Units for Image Captioning
Aligning Linguistic Words and Visual Semantic Units for Image Captioning
Longteng Guo
Jing Liu
Jinhui Tang
Jiangwei Li
W. Luo
Hanqing Lu
83
102
0
06 Aug 2019
Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector
Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector
Qi Fan
Wei Zhuo
Chi-Keung Tang
Yu-Wing Tai
ObjD
174
533
0
06 Aug 2019
Cascaded Revision Network for Novel Object Captioning
Cascaded Revision Network for Novel Object Captioning
Qianyu Feng
Yu Wu
Hehe Fan
C. Yan
Yezhou Yang
50
35
0
06 Aug 2019
Answering Questions about Data Visualizations using Efficient Bimodal
  Fusion
Answering Questions about Data Visualizations using Efficient Bimodal Fusion
Kushal Kafle
Robik Shrestha
Brian L. Price
Scott D. Cohen
Christopher Kanan
74
60
0
05 Aug 2019
Visual-Relation Conscious Image Generation from Structured-Text
Visual-Relation Conscious Image Generation from Structured-Text
D. Vo
Akihiro Sugimoto
79
17
0
05 Aug 2019
Convolutional Auto-encoding of Sentence Topics for Image Paragraph
  Generation
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
Jing Wang
Yingwei Pan
Ting Yao
Jinhui Tang
Tao Mei
VLMBDLDiffM
61
36
0
01 Aug 2019
Learning Question-Guided Video Representation for Multi-Turn Video
  Question Answering
Learning Question-Guided Video Representation for Multi-Turn Video Question Answering
Guan-Lin Chao
Abhinav Rastogi
Semih Yavuz
Dilek Z. Hakkani-Tür
Jindong Chen
Ian Lane
51
6
0
31 Jul 2019
Finding Moments in Video Collections Using Natural Language
Finding Moments in Video Collections Using Natural Language
Victor Escorcia
Mattia Soldan
Josef Sivic
Guohao Li
Bryan C. Russell
55
7
0
30 Jul 2019
V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive
  Matrices
V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices
Damien Teney
Peng Wang
Jiewei Cao
Lingqiao Liu
Chunhua Shen
Anton Van Den Hengel
65
31
0
29 Jul 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question
  Answering
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
77
51
0
28 Jul 2019
Real-time Visual Object Tracking with Natural Language Description
Real-time Visual Object Tracking with Natural Language Description
Qi Feng
Vitaly Ablavsky
Qinxun Bai
Guorong Li
Stan Sclaroff
VLMObjDVOT
137
53
0
26 Jul 2019
Bilinear Graph Networks for Visual Question Answering
Bilinear Graph Networks for Visual Question Answering
Dalu Guo
Chang Xu
Dacheng Tao
GNN
86
54
0
23 Jul 2019
Position Focused Attention Network for Image-Text Matching
Position Focused Attention Network for Image-Text Matching
Yaxiong Wang
Hao-Hsiang Yang
Xueming Qian
Lin Ma
Jing Lu
Biao Li
Xin Fan
54
172
0
23 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
141
136
0
22 Jul 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine
  Translation
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
Shantipriya Parida
Ondrej Bojar
S. Dash
77
63
0
21 Jul 2019
CraftAssist: A Framework for Dialogue-enabled Interactive Agents
CraftAssist: A Framework for Dialogue-enabled Interactive Agents
Jonathan Gray
Kavya Srinet
Yacine Jernite
Haonan Yu
Zhuoyuan Chen
Demi Guo
Siddharth Goyal
C. L. Zitnick
Arthur Szlam
78
39
0
19 Jul 2019
Two-stream Spatiotemporal Feature for Video QA Task
Two-stream Spatiotemporal Feature for Video QA Task
Chiwan Song
Woobin Im
Sung-eui Yoon
21
0
0
11 Jul 2019
Learning by Abstraction: The Neural State Machine
Learning by Abstraction: The Neural State Machine
Drew A. Hudson
Christopher D. Manning
NAIOCL
131
262
0
09 Jul 2019
Kite: Automatic speech recognition for unmanned aerial vehicles
Kite: Automatic speech recognition for unmanned aerial vehicles
Dan Oneaţă
H. Cucu
40
13
0
02 Jul 2019
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue
  Systems
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
63
112
0
02 Jul 2019
ICDAR 2019 Competition on Scene Text Visual Question Answering
ICDAR 2019 Competition on Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
74
76
0
30 Jun 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
118
811
0
25 Jun 2019
Integrating Knowledge and Reasoning in Image Understanding
Integrating Knowledge and Reasoning in Image Understanding
Somak Aditya
Yezhou Yang
Chitta Baral
OCL
75
41
0
24 Jun 2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge
  2019
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019
Xiaohan Wang
Yu Wu
Linchao Zhu
Yi Yang
77
19
0
22 Jun 2019
The Limited Multi-Label Projection Layer
The Limited Multi-Label Projection Layer
Brandon Amos
V. Koltun
J. Zico Kolter
95
36
0
20 Jun 2019
Expressing Visual Relationships via Language
Expressing Visual Relationships via Language
Hao Tan
Franck Dernoncourt
Zhe Lin
Trung Bui
Joey Tianyi Zhou
90
68
0
18 Jun 2019
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Kaidi Cao
Colin Wei
Adrien Gaidon
Nikos Arechiga
Tengyu Ma
139
1,617
0
18 Jun 2019
ParNet: Position-aware Aggregated Relation Network for Image-Text
  matching
ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Yaxian Xia
Lun Huang
Wenmin Wang
Xiao-Yong Wei
Jie Chen
126
1
0
17 Jun 2019
Image Captioning with Integrated Bottom-Up and Multi-level Residual
  Top-Down Attention for Game Scene Understanding
Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding
Jian Zheng
S. Krishnamurthy
Ruxin Chen
Min-Hung Chen
Zhenhao Ge
Xiaohua Li
77
4
0
16 Jun 2019
Know What You Don't Know: Modeling a Pragmatic Speaker that Refers to
  Objects of Unknown Categories
Know What You Don't Know: Modeling a Pragmatic Speaker that Refers to Objects of Unknown Categories
Sina Zarrieß
David Schlangen
47
16
0
13 Jun 2019
Does Learning Require Memorization? A Short Tale about a Long Tail
Does Learning Require Memorization? A Short Tale about a Long Tail
Vitaly Feldman
TDI
200
504
0
12 Jun 2019
Learning Predicates as Functions to Enable Few-shot Scene Graph
  Prediction
Learning Predicates as Functions to Enable Few-shot Scene Graph Prediction
Apoorva Dornadula
Austin Narcomey
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
OCL
58
3
0
12 Jun 2019
Relationship-Embedded Representation Learning for Grounding Referring
  Expressions
Relationship-Embedded Representation Learning for Grounding Referring Expressions
Sibei Yang
Guanbin Li
Yizhou Yu
ObjD
93
55
0
11 Jun 2019
Learning Representations of Graph Data -- A Survey
Learning Representations of Graph Data -- A Survey
Mital Kinderkhedia
GNN
79
12
0
07 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via
  Question Answering
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
146
478
0
06 Jun 2019
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Zhengjun Zha
Daqing Liu
Hanwang Zhang
Yongdong Zhang
Feng Wu
66
122
0
06 Jun 2019
The PhotoBook Dataset: Building Common Ground through Visually-Grounded
  Dialogue
The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue
J. Haber
Tim Baumgärtner
Ece Takmaz
Lieke Gelderloos
Elia Bruni
Raquel Fernández
64
77
0
04 Jun 2019
Relational Reasoning using Prior Knowledge for Visual Captioning
Relational Reasoning using Prior Knowledge for Visual Captioning
Jingyi Hou
Xinxiao Wu
Yayun Qi
Wentian Zhao
Jiebo Luo
Yunde Jia
85
14
0
04 Jun 2019
Generating Question Relevant Captions to Aid Visual Question Answering
Generating Question Relevant Captions to Aid Visual Question Answering
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
121
43
0
03 Jun 2019
Previous
123...262728...313233
Next