ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXiv (abs)PDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,650 papers shown
Title
Semantic Compositional Learning for Low-shot Scene Graph Generation
Semantic Compositional Learning for Low-shot Scene Graph Generation
Tao He
Lianli Gao
Jingkuan Song
Jianfei Cai
Yuan-Fang Li
CoGe
88
8
0
19 Aug 2021
Exploiting Scene Graphs for Human-Object Interaction Detection
Exploiting Scene Graphs for Human-Object Interaction Detection
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
80
27
0
19 Aug 2021
The Multi-Modal Video Reasoning and Analyzing Competition
The Multi-Modal Video Reasoning and Analyzing Competition
Haoran Peng
He Huang
Li Xu
Tianjiao Li
Jing Liu
...
Yuanzhong Liu
Tao He
Fuwei Zhang
Xianbin Liu
Tao Lin
45
2
0
18 Aug 2021
Who's Waldo? Linking People Across Text and Images
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
88
21
0
16 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and
  Intra-modal Knowledge Integration
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
64
56
0
16 Aug 2021
Sharing Cognition: Human Gesture and Natural Language Grounding Based
  Planning and Navigation for Indoor Robots
Sharing Cognition: Human Gesture and Natural Language Grounding Based Planning and Navigation for Indoor Robots
Gourav Kumar
Soumyadip Maity
R. Roychoudhury
Brojeshwar Bhowmick
LM&Ro
35
1
0
14 Aug 2021
Cross-Modal Graph with Meta Concepts for Video Captioning
Cross-Modal Graph with Meta Concepts for Video Captioning
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
67
7
0
14 Aug 2021
Unconditional Scene Graph Generation
Unconditional Scene Graph Generation
Sarthak Garg
Helisa Dhamo
Azade Farshad
Sabrina Musatian
Nassir Navab
F. Tombari
110
25
0
12 Aug 2021
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual,
  Interactive, and Ecological Environments
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments
S. Srivastava
Chengshu Li
Michael Lingelbach
Roberto Martín-Martín
Fei Xia
...
Chenxi Liu
Silvio Savarese
H. Gweon
Jiajun Wu
Li Fei-Fei
LM&Ro
251
168
0
06 Aug 2021
Dual Graph Convolutional Networks with Transformer and Curriculum
  Learning for Image Captioning
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
Xinzhi Dong
Chengjiang Long
Wenju Xu
Chunxia Xiao
ViT
147
68
0
05 Aug 2021
Hierarchical Representations and Explicit Memory: Learning Effective
  Navigation Policies on 3D Scene Graphs using Graph Neural Networks
Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks
Zachary Ravichandran
Lisa Peng
Nathan Hughes
J. D. Griffith
Luca Carlone
130
71
0
02 Aug 2021
Distributed Attention for Grounded Image Captioning
Distributed Attention for Grounded Image Captioning
Nenglun Chen
Xingjia Pan
Runnan Chen
Lei Yang
Zhiwen Lin
Yuqiang Ren
Haolei Yuan
Xiaowei Guo
Feiyue Huang
Wenping Wang
76
21
0
02 Aug 2021
Chest ImaGenome Dataset for Clinical Reasoning
Chest ImaGenome Dataset for Clinical Reasoning
Joy T. Wu
Nkechinyere N. Agu
Ismini Lourentzou
Arjun Sharma
J. Paguio
...
William Mitchell
Satyananda Kashyap
Andrea Giovannini
Leo Anthony Celi
Mehdi Moradi
58
67
0
31 Jul 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
  via Cross-modal Pretraining
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining
Xunlin Zhan
Yangxin Wu
Xiao Dong
Yunchao Wei
Minlong Lu
Yichi Zhang
Hang Xu
Xiaodan Liang
ViT
92
67
0
30 Jul 2021
ReFormer: The Relational Transformer for Image Captioning
ReFormer: The Relational Transformer for Image Captioning
Xuewen Yang
Yingru Liu
Xin Wang
ViT
103
57
0
29 Jul 2021
A Thorough Review on Recent Deep Learning Methodologies for Image
  Captioning
A Thorough Review on Recent Deep Learning Methodologies for Image Captioning
Ahmed Elhagry
Karima Kadaoui
VLM
65
17
0
28 Jul 2021
Is Object Detection Necessary for Human-Object Interaction Recognition?
Is Object Detection Necessary for Human-Object Interaction Recognition?
Ying Jin
Yinpeng Chen
Lijuan Wang
Jianfeng Wang
Pei Yu
Zicheng Liu
Lei Li
80
7
0
27 Jul 2021
Image Scene Graph Generation (SGG) Benchmark
Image Scene Graph Generation (SGG) Benchmark
Xiao Han
Jianwei Yang
Houdong Hu
Lei Zhang
Jianfeng Gao
Pengchuan Zhang
76
38
0
27 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
72
129
0
26 Jul 2021
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text
  Recognition
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
A. Bhunia
Aneeshan Sain
Amandeep Kumar
S. Ghose
Pinaki Nath Chowdhury
Yi-Zhe Song
118
56
0
26 Jul 2021
Query2Label: A Simple Transformer Way to Multi-Label Classification
Query2Label: A Simple Transformer Way to Multi-Label Classification
Shilong Liu
Lei Zhang
Xiao Yang
Hang Su
Jun Zhu
73
193
0
22 Jul 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
330
1,988
0
16 Jul 2021
Neighbor-view Enhanced Model for Vision and Language Navigation
Neighbor-view Enhanced Model for Vision and Language Navigation
Dongyan An
Yuankai Qi
Yan Huang
Qi Wu
Liang Wang
Tieniu Tan
LM&Ro
82
71
0
15 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DVVLMMLLM
153
270
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIPVLMMLLM
272
412
0
13 Jul 2021
FairyTailor: A Multimodal Generative Framework for Storytelling
FairyTailor: A Multimodal Generative Framework for Storytelling
Eden Bensaid
Mauro Martino
Benjamin Hoover
Hendrik Strobelt
LRM
77
20
0
13 Jul 2021
Scenes and Surroundings: Scene Graph Generation using Relation
  Transformer
Scenes and Surroundings: Scene Graph Generation using Relation Transformer
Rajat Koner
Poulami Sinhamahapatra
Volker Tresp
ViT
99
8
0
12 Jul 2021
Modeling Explicit Concerning States for Reinforcement Learning in Visual
  Dialogue
Modeling Explicit Concerning States for Reinforcement Learning in Visual Dialogue
Zipeng Xu
Fandong Meng
Xiaojie Wang
Duo Zheng
Chenxu Lv
Jie Zhou
OffRL
72
6
0
12 Jul 2021
Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge
  Integration
Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration
Xuan Kan
Hejie Cui
Carl Yang
126
42
0
11 Jul 2021
Predicate correlation learning for scene graph generation
Predicate correlation learning for scene graph generation
Lei Tao
Li Mi
Nannan Li
Xianhang Cheng
Yaosi Hu
Zhenzhong Chen
101
19
0
06 Jul 2021
Mind Your Outliers! Investigating the Negative Impact of Outliers on
  Active Learning for Visual Question Answering
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Siddharth Karamcheti
Ranjay Krishna
Li Fei-Fei
Christopher D. Manning
96
92
0
06 Jul 2021
Recovering the Unbiased Scene Graphs from the Biased Ones
Recovering the Unbiased Scene Graphs from the Biased Ones
Meng-Jiun Chiou
Henghui Ding
Hanshu Yan
Changhu Wang
Roger Zimmermann
Jiashi Feng
112
119
0
05 Jul 2021
Web-Scale Generic Object Detection at Microsoft Bing
Web-Scale Generic Object Detection at Microsoft Bing
S. Chen
Saurajit Mukherjee
Unmesh Phadke
Tingting Wang
Junwon Park
Ravi Theja Yada
ObjDVLM
99
0
0
05 Jul 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and
  Generation
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Jing Liu
Xinxin Zhu
Fei Liu
Longteng Guo
Zijia Zhao
...
Weining Wang
Hanqing Lu
Shiyu Zhou
Jiajun Zhang
Jinqiao Wang
82
38
0
01 Jul 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded
  Compositional Visual Question Answering based on Scene Graphs
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
58
2
0
28 Jun 2021
Building a Video-and-Language Dataset with Human Actions for Multimodal
  Logical Inference
Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference
Riko Suzuki
Hitomi Yanaka
K. Mineshima
D. Bekki
VGenMLLM
53
1
0
27 Jun 2021
Core Challenges in Embodied Vision-Language Planning
Core Challenges in Embodied Vision-Language Planning
Jonathan M Francis
Nariaki Kitamura
Felix Labelle
Xiaopeng Lu
Ingrid Navarro
Jean Oh
LM&Ro
144
48
0
26 Jun 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
242
793
0
25 Jun 2021
Probing Inter-modality: Visual Parsing with Self-Attention for
  Vision-Language Pre-training
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
94
89
0
25 Jun 2021
Structured Sparse R-CNN for Direct Scene Graph Generation
Structured Sparse R-CNN for Direct Scene Graph Generation
Yao Teng
Limin Wang
3DPCGNN
115
56
0
21 Jun 2021
Exploring Semantic Relationships for Unpaired Image Captioning
Exploring Semantic Relationships for Unpaired Image Captioning
Fenglin Liu
Meng Gao
Tianhao Zhang
Yuexian Zou
142
7
0
20 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video
  Question Answering
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
82
54
0
19 Jun 2021
Learning to Predict Visual Attributes in the Wild
Learning to Predict Visual Attributes in the Wild
Khoi Pham
Kushal Kafle
Zhe Lin
Zhi Ding
Scott D. Cohen
Q. Tran
Abhinav Shrivastava
52
113
0
17 Jun 2021
How Modular Should Neural Module Networks Be for Systematic
  Generalization?
How Modular Should Neural Module Networks Be for Systematic Generalization?
Vanessa D’Amario
Tomotake Sasaki
Xavier Boix
69
17
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFinMQAI4MH
177
863
0
14 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
79
11
0
10 Jun 2021
PAM: Understanding Product Images in Cross Product Category Attribute
  Extraction
PAM: Understanding Product Images in Cross Product Category Attribute Extraction
Rongmei Lin
Xiang He
J. Feng
Nasser Zalmout
Yan Liang
Li Xiong
Xin Luna Dong
88
36
0
08 Jun 2021
SynthRef: Generation of Synthetic Referring Expressions for Object
  Segmentation
SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation
Ioannis V. Kazakos
Carles Ventura
Míriam Bellver
Carina Silberer
Xavier Giró-i-Nieto
DiffM
23
2
0
08 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual
  Grounding
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
116
197
0
06 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLMLRM
104
384
0
04 Jun 2021
Previous
123...161718...313233
Next