ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXivPDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,062 papers shown
Title
Learning to Discover and Detect Objects
Learning to Discover and Detect Objects
V. Fomenko
Ismail Elezi
Deva Ramanan
Laura Leal-Taixé
Aljosa Osep
ObjD
33
10
0
19 Oct 2022
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Jihyeon Janel Lee
Wooyoung Kang
Eun-Sol Kim
CoGe
19
3
0
19 Oct 2022
Commonsense Knowledge from Scene Graphs for Textual Environments
Commonsense Knowledge from Scene Graphs for Textual Environments
Tsunehiko Tanaka
Daiki Kimura
Michiaki Tatsubori
20
2
0
19 Oct 2022
Contrastive Language-Image Pre-Training with Knowledge Graphs
Contrastive Language-Image Pre-Training with Knowledge Graphs
Xuran Pan
Tianzhu Ye
Dongchen Han
S. Song
Gao Huang
VLM
CLIP
30
43
0
17 Oct 2022
Plausible May Not Be Faithful: Probing Object Hallucination in
  Vision-Language Pre-training
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Wenliang Dai
Zihan Liu
Ziwei Ji
Dan Su
Pascale Fung
MLLM
VLM
32
62
0
14 Oct 2022
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Anurag Roy
David Johnson Ekka
Saptarshi Ghosh
Abir Das
23
1
0
13 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal
  Contrastive Learning
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
20
68
0
12 Oct 2022
Transformer-based Localization from Embodied Dialog with Large-scale
  Pre-training
Transformer-based Localization from Embodied Dialog with Large-scale Pre-training
Meera Hahn
James M. Rehg
LM&Ro
40
4
0
10 Oct 2022
Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced
  Hard Negatives for Cross-modal Information Retrieval
Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval
Yan Gong
Georgina Cosma
27
11
0
10 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language
  Representation Learning
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
21
8
0
09 Oct 2022
LOCL: Learning Object-Attribute Composition using Localization
LOCL: Learning Object-Attribute Composition using Localization
Satish Kumar
A S M Iftekhar
Ekta Prashnani
B.S.Manjunath
19
3
0
07 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
40
10
0
04 Oct 2022
Unbiased Scene Graph Generation using Predicate Similarities
Unbiased Scene Graph Generation using Predicate Similarities
Misaki Ohashi
Yusuke Matsui
32
1
0
03 Oct 2022
Data Poisoning Attacks Against Multimodal Encoders
Data Poisoning Attacks Against Multimodal Encoders
Ziqing Yang
Xinlei He
Zheng Li
Michael Backes
Mathias Humbert
Pascal Berrang
Yang Zhang
AAML
116
45
0
30 Sep 2022
DRAMA: Joint Risk Localization and Captioning in Driving
DRAMA: Joint Risk Localization and Captioning in Driving
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
107
87
0
22 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question
  Answering
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
39
21
0
21 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,113
0
20 Sep 2022
The Ability of Image-Language Explainable Models to Resemble Domain
  Expertise
The Ability of Image-Language Explainable Models to Resemble Domain Expertise
P. Werner
Anna Zapaishchykova
Ujjwal Ratan
48
2
0
19 Sep 2022
3D VSG: Long-term Semantic Scene Change Prediction through 3D Variable
  Scene Graphs
3D VSG: Long-term Semantic Scene Change Prediction through 3D Variable Scene Graphs
Sam Looper
Javier Rodriguez Puigvert
Roland Siegwart
Cesar Cadena
L. Schmid
3DPC
13
22
0
16 Sep 2022
VIPHY: Probing "Visible" Physical Commonsense Knowledge
VIPHY: Probing "Visible" Physical Commonsense Knowledge
Shikhar Singh
Ehsan Qasemi
Muhao Chen
46
6
0
15 Sep 2022
Combining Metric Learning and Attention Heads For Accurate and Efficient
  Multilabel Image Classification
Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification
K. Prokofiev
V. Sovrasov
VLM
26
9
0
14 Sep 2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language
  Representation Alignment
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Rui Song
Houqiang Li
Jiebo Luo
CLIP
VLM
25
68
0
14 Sep 2022
Towards explainable evaluation of language models on the semantic
  similarity of visual concepts
Towards explainable evaluation of language models on the semantic similarity of visual concepts
Maria Lymperaiou
George Manoliadis
Orfeas Menis Mastromichalakis
Edmund Dervakos
Giorgos Stamou
AAML
24
5
0
08 Sep 2022
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Jiong Wang
Zhou Zhao
Weike Jin
18
0
0
08 Sep 2022
VGStore: A Multimodal Extension to SPARQL for Querying RDF Scene Graph
VGStore: A Multimodal Extension to SPARQL for Querying RDF Scene Graph
Yanzeng Li
Zilong Zheng
Wenjuan Han
Lei Zou
34
2
0
07 Sep 2022
Scalable Regularization of Scene Graph Generation Models using Symbolic
  Theories
Scalable Regularization of Scene Graph Generation Models using Symbolic Theories
Davide Buffelli
Efthymia Tsamoura
25
2
0
06 Sep 2022
Design of the topology for contrastive visual-textual alignment
Design of the topology for contrastive visual-textual alignment
Zhun Sun
30
1
0
05 Sep 2022
Interactive Question Answering Systems: Literature Review
Interactive Question Answering Systems: Literature Review
Giovanni Maria Biancofiore
Yashar Deldjoo
Tommaso Di Noia
E. Sciascio
Fedelucio Narducci
34
13
0
04 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked
  Visual Modeling
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
32
64
0
04 Sep 2022
Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Wanshu Fan
Yen-Chun Chen
Dongdong Chen
Yu Cheng
Lu Yuan
Yu-Chiang Frank Wang
DiffM
34
90
0
29 Aug 2022
MuMUR : Multilingual Multimodal Universal Retrieval
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
44
3
0
24 Aug 2022
FashionVQA: A Domain-Specific Visual Question Answering System
FashionVQA: A Domain-Specific Visual Question Answering System
Min Wang
A. Mahjoubfar
Anupama Joshi
29
4
0
24 Aug 2022
Learning More May Not Be Better: Knowledge Transferability in Vision and
  Language Tasks
Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
VLM
41
0
0
23 Aug 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and
  Vision-Language Tasks
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
54
629
0
22 Aug 2022
VLMAE: Vision-Language Masked Autoencoder
VLMAE: Vision-Language Masked Autoencoder
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Chen Wu
Xiujun Shu
Bohan Ren
VLM
34
11
0
19 Aug 2022
See Finer, See More: Implicit Modality Alignment for Text-based Person
  Retrieval
See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval
Xiujun Shu
Wei Wen
Haoqian Wu
Keyun Chen
Yi-Zhe Song
Ruizhi Qiao
Bohan Ren
Xiao Wang
27
91
0
18 Aug 2022
Towards Open-vocabulary Scene Graph Generation with Prompt-based
  Finetuning
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
VLM
34
50
0
17 Aug 2022
Context-aware Mixture-of-Experts for Unbiased Scene Graph Generation
Context-aware Mixture-of-Experts for Unbiased Scene Graph Generation
Liguang Zhou
Yuhongze Zhou
Tin Lun Lam
Yangsheng Xu
EDL
MoE
28
2
0
15 Aug 2022
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language
  Pre-training
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training
Jaeseok Byun
Taebaek Hwang
Jianlong Fu
Taesup Moon
VLM
23
11
0
08 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation
  Learning
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
36
67
0
03 Aug 2022
Rethinking the Evaluation of Unbiased Scene Graph Generation
Rethinking the Evaluation of Unbiased Scene Graph Generation
Xingchen Li
Long Chen
Jian Shao
Shaoning Xiao
Songyang Zhang
Jun Xiao
42
12
0
03 Aug 2022
Integrating Object-aware and Interaction-aware Knowledge for Weakly
  Supervised Scene Graph Generation
Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation
Xingchen Li
Long Chen
Wenbo Ma
Yi Yang
Jun Xiao
21
26
0
03 Aug 2022
Augmenting Vision Language Pretraining by Learning Codebook with Visual
  Semantics
Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics
Xiaoyuan Guo
Jiali Duan
C.-C. Jay Kuo
J. Gichoya
Imon Banerjee
VLM
25
1
0
31 Jul 2022
Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI
  Detection
Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection
Xiaoqian Wu
Yong-Lu Li
Xinpeng Liu
Junyi Zhang
Yuzhe Wu
Cewu Lu
29
37
0
28 Jul 2022
Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
Li Xu
Haoxuan Qu
Jason Kuen
Jiuxiang Gu
Jun Liu
CML
31
27
0
23 Jul 2022
Panoptic Scene Graph Generation
Panoptic Scene Graph Generation
Jingkang Yang
Yi Zhe Ang
Zujin Guo
Kaiyang Zhou
Wayne Zhang
Ziwei Liu
47
106
0
22 Jul 2022
Human-centric Image Cropping with Partition-aware and Content-preserving
  Features
Human-centric Image Cropping with Partition-aware and Content-preserving Features
Bo Zhang
Li Niu
Xing Zhao
Liqing Zhang
21
5
0
21 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
37
27
0
20 Jul 2022
ViGAT: Bottom-up event recognition and explanation in video using
  factorized graph attention network
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
19
10
0
20 Jul 2022
GRIT: Faster and Better Image captioning Transformer Using Dual Visual
  Features
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
ViT
36
106
0
20 Jul 2022
Previous
123...789...202122
Next