ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXivPDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,053 papers shown
Title
Learning from the Scene and Borrowing from the Rich: Tackling the Long
  Tail in Scene Graph Generation
Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation
Tao He
Lianli Gao
Jingkuan Song
Jianfei Cai
Yuan-Fang Li
24
30
0
13 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
30
433
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
35
488
0
11 Jun 2020
Counterfactual VQA: A Cause-Effect Look at Language Bias
Counterfactual VQA: A Cause-Effect Look at Language Bias
Yulei Niu
Kaihua Tang
Hanwang Zhang
Zhiwu Lu
Xiansheng Hua
Ji-Rong Wen
CML
56
394
0
08 Jun 2020
Give Me Something to Eat: Referring Expression Comprehension with
  Commonsense Knowledge
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Peng Wang
Dongyang Liu
Hui Li
Qi Wu
ObjD
24
19
0
02 Jun 2020
Graph Density-Aware Losses for Novel Compositions in Scene Graph
  Generation
Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation
Boris Knyazev
H. D. Vries
Cătălina Cangea
Graham W. Taylor
Aaron Courville
Eugene Belilovsky
30
56
0
17 May 2020
Visual Relationship Detection using Scene Graphs: A Survey
Visual Relationship Detection using Scene Graphs: A Survey
Aniket Agarwal
Ayush Mangal
Vipul
GNN
25
20
0
16 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained
  Vision-and-Language Models
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
22
127
0
15 May 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
43
36
0
12 May 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
42
580
0
10 May 2020
Character Matters: Video Story Understanding with Character-Aware
  Relations
Character Matters: Video Story Understanding with Character-Aware Relations
Shijie Geng
Ji Zhang
Zuohui Fu
Peng Gao
Hang Zhang
Gerard de Melo
18
11
0
09 May 2020
History for Visual Dialog: Do we really need it?
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
19
69
0
08 May 2020
Words aren't enough, their order matters: On the Robustness of Grounding
  Visual Referring Expressions
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions
Arjun Reddy Akula
Spandana Gella
Yaser Al-Onaizan
Song-Chun Zhu
Siva Reddy
ObjD
26
52
0
04 May 2020
Probing Contextual Language Models for Common Ground with Visual
  Representations
Probing Contextual Language Models for Common Ground with Visual Representations
Gabriel Ilharco
Rowan Zellers
Ali Farhadi
Hannaneh Hajishirzi
30
14
0
01 May 2020
VD-BERT: A Unified Vision and Dialog Transformer with BERT
VD-BERT: A Unified Vision and Dialog Transformer with BERT
Yue Wang
Chenyu You
Michael R. Lyu
Irwin King
Caiming Xiong
Guosheng Lin
24
102
0
28 Apr 2020
Fashionpedia: Ontology, Segmentation, and an Attribute Localization
  Dataset
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
Menglin Jia
Mengyun Shi
Mikhail Sirotenko
Huayu Chen
Claire Cardie
B. Hariharan
Hartwig Adam
Serge J. Belongie
27
93
0
26 Apr 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
39
23
0
24 Apr 2020
Experience Grounds Language
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
21
351
0
21 Apr 2020
Are we pretraining it right? Digging deeper into visio-linguistic
  pretraining
Are we pretraining it right? Digging deeper into visio-linguistic pretraining
Amanpreet Singh
Vedanuj Goswami
Devi Parikh
VLM
40
48
0
19 Apr 2020
Optimistic Agent: Accurate Graph-Based Value Estimation for More
  Successful Visual Navigation
Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation
M. Moghaddam
Qi Wu
Ehsan Abbasnejad
Javen Qinfeng Shi
20
4
0
07 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal
  Transformers
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
50
436
0
02 Apr 2020
Graph Structured Network for Image-Text Matching
Graph Structured Network for Image-Text Matching
Chunxiao Liu
Zhendong Mao
Tianzhu Zhang
Hongtao Xie
Bin Wang
Yongdong Zhang
22
232
0
01 Apr 2020
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Boxiao Pan
Haoye Cai
De-An Huang
Kuan-Hui Lee
Adrien Gaidon
Ehsan Adeli
Juan Carlos Niebles
31
235
0
31 Mar 2020
Grounded Situation Recognition
Grounded Situation Recognition
Sarah M Pratt
Mark Yatskar
Luca Weihs
Ali Farhadi
Aniruddha Kembhavi
14
111
0
26 Mar 2020
Learning Object Permanence from Video
Learning Object Permanence from Video
Aviv Shamsian
Ofri Kleinfeld
Amir Globerson
Gal Chechik
SSL
39
31
0
23 Mar 2020
Visual Question Answering for Cultural Heritage
Visual Question Answering for Cultural Heritage
P. Bongini
Federico Becattini
Andrew D. Bagdanov
A. Bimbo
232
22
0
22 Mar 2020
Affinity Graph Supervision for Visual Recognition
Affinity Graph Supervision for Visual Recognition
Chu Wang
Babak Samari
Vladimir G. Kim
S. Chaudhuri
Kaleem Siddiqi
GNN
24
8
0
19 Mar 2020
Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge
  into Deep Neural Networks
Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge into Deep Neural Networks
Karan Sikka
Andrew Silberfarb
John Byrnes
Indranil Sur
Edmond Chow
Ajay Divakaran
R. Rohwer
NAI
19
11
0
16 Mar 2020
A Study on Multimodal and Interactive Explanations for Visual Question
  Answering
A Study on Multimodal and Interactive Explanations for Visual Question Answering
Kamran Alipour
J. Schulze
Yi Yao
Avi Ziskind
Giedrius Burachas
32
27
0
01 Mar 2020
Cops-Ref: A new Dataset and Task on Compositional Referring Expression
  Comprehension
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension
Zhenfang Chen
Peng Wang
Lin Ma
Kwan-Yee K. Wong
Qi Wu
ObjD
31
67
0
01 Mar 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with
  Abstract Scene Graphs
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen
Qin Jin
Peng Wang
Qi Wu
DiffM
36
215
0
01 Mar 2020
Unbiased Scene Graph Generation from Biased Training
Unbiased Scene Graph Generation from Biased Training
Kaihua Tang
Yulei Niu
Jianqiang Huang
Jiaxin Shi
Hanwang Zhang
CML
22
680
0
27 Feb 2020
On the General Value of Evidence, and Bilingual Scene-Text Visual
  Question Answering
On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering
Xinyu Wang
Yuliang Liu
Chunhua Shen
Chun Chet Ng
Canjie Luo
Lianwen Jin
C. Chan
Anton Van Den Hengel
Liangwei Wang
31
91
0
24 Feb 2020
Captioning Images Taken by People Who Are Blind
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
22
181
0
20 Feb 2020
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
Hang Xu
Linpu Fang
Xiaodan Liang
Wenxiong Kang
Zhenguo Li
ObjD
32
21
0
18 Feb 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic
  Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO
  Framework
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
27
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image
  Captioning With R-CNN Feature Distribution Composition (FDC)
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
25
16
0
15 Feb 2020
Object Detection as a Positive-Unlabeled Problem
Object Detection as a Positive-Unlabeled Problem
Yuewei Yang
Kevin J Liang
Lawrence Carin
21
38
0
11 Feb 2020
Controlling generative models with continuous factors of variations
Controlling generative models with continuous factors of variations
Antoine Plumerault
Hervé Le Borgne
C´eline Hudelot
DRL
30
127
0
28 Jan 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised
  Image-Text Data
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
40
259
0
22 Jan 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
23
17
0
20 Jan 2020
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Li Wang
Zechen Bai
Yonghua Zhang
Hongtao Lu
27
67
0
15 Jan 2020
Cross-dataset Training for Class Increasing Object Detection
Cross-dataset Training for Class Increasing Object Detection
Yongqiang Yao
Yan Wang
Yu-Xiao Guo
Jiaojiao Lin
Hongwei Qin
Junjie Yan
ObjD
24
17
0
14 Jan 2020
In Defense of Grid Features for Visual Question Answering
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
23
318
0
10 Jan 2020
Identifying and Compensating for Feature Deviation in Imbalanced Deep
  Learning
Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning
Han-Jia Ye
Hong-You Chen
De-Chuan Zhan
Wei-Lun Chao
32
99
0
06 Jan 2020
LayoutLM: Pre-training of Text and Layout for Document Image
  Understanding
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
16
685
0
31 Dec 2019
Personalizing Fast-Forward Videos Based on Visual and Textual Features
  from Social Network
Personalizing Fast-Forward Videos Based on Visual and Textual Features from Social Network
W. Ramos
M. Silva
Edson Roteia Araujo Junior
Alan C. Neves
Erickson R. Nascimento
22
6
0
29 Dec 2019
A Review on Intelligent Object Perception Methods Combining
  Knowledge-based Reasoning and Machine Learning
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning
Filippos Gouidis
Alexandros Vassiliades
T. Patkos
Antonis Argyros
Nick Bassiliades
Dimitris Plexousakis
OCL
29
12
0
26 Dec 2019
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal
  Multitask Learning
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning
Huaizheng Zhang
Yong Luo
Qiming Ai
Yonggang Wen
25
15
0
21 Dec 2019
Meshed-Memory Transformer for Image Captioning
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
14
868
0
17 Dec 2019
Previous
123...161718...202122
Next