ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXiv (abs)PDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,650 papers shown
Title
LayoutDiffuse: Adapting Foundational Diffusion Models for
  Layout-to-Image Generation
LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation
Jiaxin Cheng
Xiao Liang
Xingjian Shi
Tong He
Tianjun Xiao
Mu Li
DiffM
82
69
0
16 Feb 2023
Retrieval-augmented Image Captioning
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
80
29
0
16 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
95
7
0
16 Feb 2023
PolyFormer: Referring Image Segmentation as Sequential Polygon
  Generation
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu
Hui Ding
Zhaowei Cai
Yuting Zhang
R. Satzoda
Vijay Mahadevan
R. Manmatha
ObjD
123
133
0
14 Feb 2023
Context Understanding in Computer Vision: A Survey
Context Understanding in Computer Vision: A Survey
Xuan Wang
Zhigang Zhu
103
52
0
10 Feb 2023
1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop
1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop
Qixun Wang
Xiaofeng Guo
Haofan Wang
ViT
58
4
0
06 Feb 2023
Controlling for Stereotypes in Multimodal Language Model Evaluation
Controlling for Stereotypes in Multimodal Language Model Evaluation
Manuj Malik
Richard Johansson
131
1
0
03 Feb 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution
  Generalization of VQA Models
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
52
1
0
28 Jan 2023
Multimodal Event Transformer for Image-guided Story Ending Generation
Multimodal Event Transformer for Image-guided Story Ending Generation
Yucheng Zhou
Guodong Long
78
20
0
26 Jan 2023
Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based
  Disparities
Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities
Melissa Hall
Laura Gustafson
Aaron B. Adcock
Ishan Misra
Candace Ross
VLM
101
24
0
26 Jan 2023
Implicit Shape Model Trees: Recognition of 3-D Indoor Scenes and
  Prediction of Object Poses for Mobile Robots
Implicit Shape Model Trees: Recognition of 3-D Indoor Scenes and Prediction of Object Poses for Mobile Robots
Pascal Meissner
Rüdiger Dillmann
3DPC
54
0
0
25 Jan 2023
OvarNet: Towards Open-vocabulary Object Attribute Recognition
OvarNet: Towards Open-vocabulary Object Attribute Recognition
Keyan Chen
Xiaolong Jiang
Yao Hu
Xu Tang
Yan Gao
Jianqi Chen
Weidi Xie
VLMObjD
76
41
0
23 Jan 2023
Towards Models that Can See and Read
Towards Models that Can See and Read
Roy Ganz
Oren Nuriel
Aviad Aberdam
Yair Kittenplon
Shai Mazor
Ron Litman
71
13
0
18 Jan 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation
GLIGEN: Open-Set Grounded Text-to-Image Generation
Yuheng Li
Haotian Liu
Qingyang Wu
Fangzhou Mu
Jianwei Yang
Jianfeng Gao
Chunyuan Li
Yong Jae Lee
VLM
150
603
1
17 Jan 2023
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
  Retrieval
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval
Yan Zhang
Zhong Ji
Dingrong Wang
Yanwei Pang
Xuelong Li
VLM
66
23
0
17 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language
  Models for Knowledge-based Visual Reasoning
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Songlin Yang
Yining Hong
Hao Zhang
Chuang Gan
LRMVLM
116
41
0
12 Jan 2023
Toward Building General Foundation Models for Language, Vision, and
  Vision-Language Understanding Tasks
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLMAI4CELRM
122
17
0
12 Jan 2023
Graph based Environment Representation for Vision-and-Language
  Navigation in Continuous Environments
Graph based Environment Representation for Vision-and-Language Navigation in Continuous Environments
Ting Wang
Zongkai Wu
Feiyu Yao
Donglin Wang
123
5
0
11 Jan 2023
Universal Multimodal Representation for Language Understanding
Universal Multimodal Representation for Language Understanding
Zhuosheng Zhang
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
Z. Li
Hai Zhao
SSL
109
22
0
09 Jan 2023
Rethinking Explaining Graph Neural Networks via Non-parametric Subgraph
  Matching
Rethinking Explaining Graph Neural Networks via Non-parametric Subgraph Matching
Fang Wu
Siyuan Li
Xurui Jin
Yinghui Jiang
Dragomir R. Radev
Z. Niu
Stan Z. Li
78
11
0
07 Jan 2023
Filtering, Distillation, and Hard Negatives for Vision-Language
  Pre-Training
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
Filip Radenovic
Abhimanyu Dubey
Abhishek Kadian
Todor Mihaylov
Simon Vandenhende
Yash J. Patel
Y. Wen
Vignesh Ramanathan
D. Mahajan
VLM
89
86
0
05 Jan 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
Learning Trajectory-Word Alignments for Video-Language Tasks
Xu Yang
Zhang Li
Haiyang Xu
Hanwang Zhang
Qinghao Ye
Chenliang Li
Ming Yan
Yu Zhang
Fei Huang
Songfang Huang
80
7
0
05 Jan 2023
PACO: Parts and Attributes of Common Objects
PACO: Parts and Attributes of Common Objects
Vignesh Ramanathan
Anmol Kalia
Vladan Petrovic
Yiqian Wen
Baixue Zheng
...
Abhishek Kadian
Amir Mousavi
Yi-Zhe Song
Abhimanyu Dubey
D. Mahajan
VLM
96
105
0
04 Jan 2023
A Survey On Few-shot Knowledge Graph Completion with Structural and
  Commonsense Knowledge
A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge
Haodi Ma
D. Wang
101
8
0
03 Jan 2023
Optimization of Image Transmission in a Cooperative Semantic
  Communication Networks
Optimization of Image Transmission in a Cooperative Semantic Communication Networks
Wenjing Zhang
Yining Wang
Mingzhe Chen
Tao Luo
Dusit Niyato
46
43
0
01 Jan 2023
Skew Class-balanced Re-weighting for Unbiased Scene Graph Generation
Skew Class-balanced Re-weighting for Unbiased Scene Graph Generation
Haeyong Kang
Chang D. Yoo
117
6
0
01 Jan 2023
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
Qinghao Ye
Guohai Xu
Ming Yan
Haiyang Xu
Qi Qian
Ji Zhang
Fei Huang
VLMAI4TS
225
75
0
30 Dec 2022
Escaping Saddle Points for Effective Generalization on Class-Imbalanced
  Data
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
79
31
0
28 Dec 2022
Noise-aware Learning from Web-crawled Image-Text Data for Image
  Captioning
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Woohyun Kang
Jonghwan Mun
Sungjun Lee
Byungseok Roh
VLM
97
20
0
27 Dec 2022
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
  Challenges
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
73
16
0
26 Dec 2022
Detecting Objects with Context-Likelihood Graphs and Graph Refinement
Detecting Objects with Context-Likelihood Graphs and Graph Refinement
Aritra Bhowmik
Yu Wang
N. Baka
Martin R. Oswald
Cees G. M. Snoek
79
2
0
23 Dec 2022
Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied
  Navigation
Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation
Gyan Tatiya
Jonathan M Francis
Luca Bondi
Ingrid Navarro
Eric Nyberg
Jivko Sinapov
Jean Oh
69
8
0
21 Dec 2022
Generalized Decoding for Pixel, Image, and Language
Generalized Decoding for Pixel, Image, and Language
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLMMLLMObjD
124
259
0
21 Dec 2022
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
139
120
0
21 Dec 2022
Does CLIP Bind Concepts? Probing Compositionality in Large Image Models
Does CLIP Bind Concepts? Probing Compositionality in Large Image Models
Martha Lewis
Nihal V. Nayak
Peilin Yu
Qinan Yu
Jack Merullo
Stephen H. Bach
Ellie Pavlick
VLMOCLCoGe
134
68
0
20 Dec 2022
Position-guided Text Prompt for Vision-Language Pre-training
Position-guided Text Prompt for Vision-Language Pre-training
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
70
38
0
19 Dec 2022
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form
  Video Question Answering
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering
Difei Gao
Luowei Zhou
Lei Ji
Linchao Zhu
Yezhou Yang
Mike Zheng Shou
87
60
0
19 Dec 2022
Universal Object Detection with Large Vision Model
Universal Object Detection with Large Vision Model
Feng-Huei Lin
Wenze Hu
Yaowei Wang
Yonghong Tian
Guangming Lu
Fanglin Chen
Yong-mei Xu
Xiaoyu Wang
VLMObjD
98
8
0
19 Dec 2022
Transferring General Multimodal Pretrained Models to Text Recognition
Transferring General Multimodal Pretrained Models to Text Recognition
Junyang Lin
Xuancheng Ren
Yichang Zhang
Gao Liu
Peng Wang
An Yang
Chang Zhou
69
4
0
19 Dec 2022
Efficient Image Captioning for Edge Devices
Efficient Image Captioning for Edge Devices
Ning Wang
Jiangrong Xie
Hangzai Luo
Qinglin Cheng
Jihao Wu
Mingbo Jia
Linlin Li
VLMCLIP
79
22
0
18 Dec 2022
SceneGATE: Scene-Graph based co-Attention networks for TExt visual
  question answering
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
118
5
0
16 Dec 2022
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
Jie Guo
Meiting Wang
Yan Zhou
Bin Song
Yuhao Chi
Wei-liang Fan
Jianglong Chang
78
16
0
16 Dec 2022
MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal
  Contributions in Vision and Language Models & Tasks
MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks
Letitia Parcalabescu
Anette Frank
88
28
0
15 Dec 2022
FlexiViT: One Model for All Patch Sizes
FlexiViT: One Model for All Patch Sizes
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim Alabdulmohsin
Filip Pavetić
VLM
153
94
0
15 Dec 2022
Summary-Oriented Vision Modeling for Multimodal Abstractive
  Summarization
Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization
Yunlong Liang
Fandong Meng
Jinan Xu
Jiaan Wang
Jinan Xu
Jie Zhou
103
22
0
15 Dec 2022
Reproducible scaling laws for contrastive language-image learning
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti
Romain Beaumont
Ross Wightman
Mitchell Wortsman
Gabriel Ilharco
Cade Gordon
Christoph Schuhmann
Ludwig Schmidt
J. Jitsev
VLMCLIP
139
824
0
14 Dec 2022
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
  Inpainting
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
Su Wang
Chitwan Saharia
Ceslee Montgomery
Jordi Pont-Tuset
Shai Noy
...
Radu Soricut
Jason Baldridge
Mohammad Norouzi
Peter Anderson
William Chan
98
188
0
13 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
94
142
0
13 Dec 2022
The Hateful Memes Challenge Next Move
The Hateful Memes Challenge Next Move
Weijun Jin
Lance Wilhelm
VLM
93
1
0
13 Dec 2022
Uniform Masking Prevails in Vision-Language Pretraining
Uniform Masking Prevails in Vision-Language Pretraining
Siddharth Verma
Yuchen Lu
Rui Hou
Hanchao Yu
Nicolas Ballas
Madian Khabsa
Amjad Almahairi
VLM
50
0
0
10 Dec 2022
Previous
123...678...313233
Next