ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXivPDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,044 papers shown
Title
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
Jie Guo
Meiting Wang
Yan Zhou
Bin Song
Yuhao Chi
Wei-liang Fan
Jianglong Chang
45
15
0
16 Dec 2022
Summary-Oriented Vision Modeling for Multimodal Abstractive
  Summarization
Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization
Yunlong Liang
Fandong Meng
Jinan Xu
Jiaan Wang
Jinan Xu
Jie Zhou
33
20
0
15 Dec 2022
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
  Inpainting
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
Su Wang
Chitwan Saharia
Ceslee Montgomery
Jordi Pont-Tuset
Shai Noy
...
Radu Soricut
Jason Baldridge
Mohammad Norouzi
Peter Anderson
William Chan
35
176
0
13 Dec 2022
Uniform Masking Prevails in Vision-Language Pretraining
Uniform Masking Prevails in Vision-Language Pretraining
Siddharth Verma
Yuchen Lu
Rui Hou
Hanchao Yu
Nicolas Ballas
Madian Khabsa
Amjad Almahairi
VLM
21
0
0
10 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
  Models
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
37
15
0
08 Dec 2022
Semantic-Conditional Diffusion Networks for Image Captioning
Semantic-Conditional Diffusion Networks for Image Captioning
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
30
62
0
06 Dec 2022
Controllable Image Captioning via Prompting
Controllable Image Captioning via Prompting
Ning Wang
Jiahao Xie
Jihao Wu
Mingbo Jia
Linlin Li
22
23
0
04 Dec 2022
Scaling Language-Image Pre-training via Masking
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
27
318
0
01 Dec 2022
Multimodal Query-guided Object Localization
Multimodal Query-guided Object Localization
Aditay Tripathi
Rajath R Dani
Anand Mishra
Anirban Chakraborty
29
0
0
01 Dec 2022
Hyperbolic Contrastive Learning for Visual Representations beyond
  Objects
Hyperbolic Contrastive Learning for Visual Representations beyond Objects
Songwei Ge
Shlok Kumar Mishra
Simon Kornblith
Chun-Liang Li
David Jacobs
OCL
SSL
21
51
0
01 Dec 2022
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual
  Reasoning
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
Zhuowan Li
Xingrui Wang
Elias Stengel-Eskin
Adam Kortylewski
Wufei Ma
Benjamin Van Durme
Max Planck Institute for Informatics
OOD
LRM
29
58
0
01 Dec 2022
Abstract Visual Reasoning with Tangram Shapes
Abstract Visual Reasoning with Tangram Shapes
Anya Ji
Noriyuki Kojima
N. Rush
Alane Suhr
Wai Keen Vong
Robert D. Hawkins
Yoav Artzi
LRM
17
34
0
29 Nov 2022
DiffG-RL: Leveraging Difference between State and Common Sense
DiffG-RL: Leveraging Difference between State and Common Sense
Tsunehiko Tanaka
Daiki Kimura
Michiaki Tatsubori
19
0
0
29 Nov 2022
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and
  Grounding
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
50
25
0
28 Nov 2022
Learning Object-Language Alignments for Open-Vocabulary Object Detection
Learning Object-Language Alignments for Open-Vocabulary Object Detection
Chuang Lin
Pei Sun
Yi-Xin Jiang
Ping Luo
Lizhen Qu
Gholamreza Haffari
Zehuan Yuan
Jianfei Cai
VLM
ObjD
29
95
0
27 Nov 2022
Conditioning Covert Geo-Location (CGL) Detection on Semantic Class
  Information
Conditioning Covert Geo-Location (CGL) Detection on Semantic Class Information
Binoy Saha
Sukhendu Das
27
0
0
27 Nov 2022
Who are you referring to? Coreference resolution in image narrations
Who are you referring to? Coreference resolution in image narrations
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
25
3
0
26 Nov 2022
ILSGAN: Independent Layer Synthesis for Unsupervised
  Foreground-Background Segmentation
ILSGAN: Independent Layer Synthesis for Unsupervised Foreground-Background Segmentation
Qiran Zou
Yu Yang
Wing Yin Cheung
Chang-rui Liu
Xiang Ji
GAN
33
4
0
25 Nov 2022
Open-vocabulary Attribute Detection
Open-vocabulary Attribute Detection
M. A. Bravo
Sudhanshu Mittal
Simon Ging
Thomas Brox
VLM
ObjD
19
30
0
23 Nov 2022
Knowledge Prompting for Few-shot Action Recognition
Knowledge Prompting for Few-shot Action Recognition
Yuheng Shi
Xinxiao Wu
Hanxi Lin
VLM
19
4
0
22 Nov 2022
Teaching Structured Vision&Language Concepts to Vision&Language Models
Teaching Structured Vision&Language Concepts to Vision&Language Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Yikang Shen
Roei Herzig
...
Donghyun Kim
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
56
70
0
21 Nov 2022
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language
  Pre-training
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
34
15
0
21 Nov 2022
Intelligent Computing: The Latest Advances, Challenges and Future
Intelligent Computing: The Latest Advances, Challenges and Future
Shiqiang Zhu
Ting Yu
Tao Xu
Hongyang Chen
Schahram Dustdar
...
Tariq S. Durrani
Huaimin Wang
Jiangxing Wu
Tongyi Zhang
Yunhe Pan
AI4CE
27
117
0
21 Nov 2022
Unifying Tracking and Image-Video Object Detection
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
26
0
0
20 Nov 2022
Leveraging per Image-Token Consistency for Vision-Language Pre-training
Leveraging per Image-Token Consistency for Vision-Language Pre-training
Yunhao Gou
Tom Ko
Hansi Yang
James T. Kwok
Yu Zhang
Mingxuan Wang
VLM
16
10
0
20 Nov 2022
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
  Vision-Language Tasks
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
...
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
26
55
0
17 Nov 2022
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
James Smith
Paola Cascante-Bonilla
Assaf Arbelle
Donghyun Kim
Yikang Shen
David D. Cox
Diyi Yang
Z. Kira
Rogerio Feris
Leonid Karlinsky
VLM
47
20
0
17 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
24
27
0
17 Nov 2022
MapQA: A Dataset for Question Answering on Choropleth Maps
MapQA: A Dataset for Question Answering on Choropleth Maps
Shuaichen Chang
David Palzer
Jialin Li
Eric Fosler-Lussier
N. Xiao
19
40
0
15 Nov 2022
A Unified Mutual Supervision Framework for Referring Expression
  Segmentation and Generation
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation
Shijia Huang
Feng Li
Hao Zhang
Siyi Liu
Lei Zhang
Liwei Wang
30
5
0
15 Nov 2022
Category-Adaptive Label Discovery and Noise Rejection for Multi-label
  Image Recognition with Partial Positive Labels
Category-Adaptive Label Discovery and Noise Rejection for Multi-label Image Recognition with Partial Positive Labels
Tao Pu
Q. Lao
Hefeng Wu
Tianshui Chen
Liang Lin
23
2
0
15 Nov 2022
Probabilistic Debiasing of Scene Graphs
Probabilistic Debiasing of Scene Graphs
Bashirul Azam Biswas
Qian Ji
22
11
0
11 Nov 2022
SSGVS: Semantic Scene Graph-to-Video Synthesis
SSGVS: Semantic Scene Graph-to-Video Synthesis
Yuren Cong
Jinhui Yi
Bodo Rosenhahn
M. Yang
67
7
0
11 Nov 2022
Watching the News: Towards VideoQA Models that can Read
Watching the News: Towards VideoQA Models that can Read
Soumya Jahagirdar
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
32
18
0
10 Nov 2022
Towards Reasoning-Aware Explainable VQA
Towards Reasoning-Aware Explainable VQA
Rakesh Vaideeswaran
Feng Gao
Abhinav Mathur
Govind Thattai
LRM
46
3
0
09 Nov 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert
  Denoisers
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji
Seungjun Nah
Xun Huang
Arash Vahdat
Jiaming Song
...
Timo Aila
S. Laine
Bryan Catanzaro
Tero Karras
Xuan Li
VLM
MoE
70
804
0
02 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Training Vision-Language Models with Less Bimodal Supervision
Elad Segal
Ben Bogin
Jonathan Berant
VLM
21
2
0
01 Nov 2022
Towards Language-driven Scientific AI
Towards Language-driven Scientific AI
José Manuél Gómez-Pérez
34
0
0
27 Oct 2022
Visual Semantic Parsing: From Images to Abstract Meaning Representation
Visual Semantic Parsing: From Images to Abstract Meaning Representation
M. A. Abdelsalam
Zhan Shi
Federico Fancellu
Kalliopi Basioti
Dhaivat Bhatt
Vladimir Pavlovic
Afsaneh Fazly
GNN
37
4
0
26 Oct 2022
Search for Concepts: Discovering Visual Concepts Using Direct
  Optimization
Search for Concepts: Discovering Visual Concepts Using Direct Optimization
P. Reddy
Paul Guerrero
Niloy J. Mitra
OCL
21
4
0
25 Oct 2022
Multilingual Multimodal Learning with Machine Translated Text
Multilingual Multimodal Learning with Machine Translated Text
Chen Qiu
Dan Oneaţă
Emanuele Bugliarello
Stella Frank
Desmond Elliott
48
13
0
24 Oct 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Extending Phrase Grounding with Pronouns in Visual Dialogues
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
30
4
0
23 Oct 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing
  Data
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
78
106
0
23 Oct 2022
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal
  Modeling
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling
Dongsheng Chen
Chaofan Tao
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
VLM
29
18
0
21 Oct 2022
Can Visual Context Improve Automatic Speech Recognition for an Embodied
  Agent?
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?
Pradip Pramanick
Chayan Sarkar
24
7
0
21 Oct 2022
Scene Text Recognition with Semantics
Scene Text Recognition with Semantics
Joshua Cesare Placidi
Yishu Miao
Zixu Wang
Lucia Specia
21
1
0
19 Oct 2022
Grounded Video Situation Recognition
Grounded Video Situation Recognition
Zeeshan Khan
C. V. Jawahar
Makarand Tapaswi
37
13
0
19 Oct 2022
Learning to Discover and Detect Objects
Learning to Discover and Detect Objects
V. Fomenko
Ismail Elezi
Deva Ramanan
Laura Leal-Taixé
Aljosa Osep
ObjD
33
10
0
19 Oct 2022
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Jihyeon Janel Lee
Wooyoung Kang
Eun-Sol Kim
CoGe
19
3
0
19 Oct 2022
Commonsense Knowledge from Scene Graphs for Textual Environments
Commonsense Knowledge from Scene Graphs for Textual Environments
Tsunehiko Tanaka
Daiki Kimura
Michiaki Tatsubori
20
2
0
19 Oct 2022
Previous
123...678...192021
Next