ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXivPDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,168 papers shown
Title
Weakly-Supervised Video Object Grounding via Causal Intervention
Weakly-Supervised Video Object Grounding via Causal Intervention
Wei Wang
Junyu Gao
Changsheng Xu
CML
32
20
0
01 Dec 2021
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
Sara Atito
Muhammad Awais
Ammarah Farooq
Zhenhua Feng
J. Kittler
19
17
0
30 Nov 2021
AssistSR: Task-oriented Video Segment Retrieval for Personal AI
  Assistant
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
Stan Weixian Lei
Difei Gao
Yuxuan Wang
Dongxing Mao
Zihan Liang
L. Ran
Mike Zheng Shou
27
8
0
30 Nov 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic
  Arithmetic
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
34
192
0
29 Nov 2021
Not All Relations are Equal: Mining Informative Labels for Scene Graph
  Generation
Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
35
30
0
26 Nov 2021
Scene Graph Generation with Geometric Context
Scene Graph Generation with Geometric Context
Vishal Kumar
Albert Mundu
S. Singh
GNN
3DV
21
2
0
25 Nov 2021
Detecting and Tracking Small and Dense Moving Objects in Satellite
  Videos: A Benchmark
Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark
Qian Yin
Qingyong Hu
Hao Liu
Feng Zhang
Yingqian Wang
Zaiping Lin
Wei An
Yulan Guo
31
87
0
25 Nov 2021
Generating More Pertinent Captions by Leveraging Semantics and Style on
  Multi-Source Datasets
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
20
12
0
24 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
53
218
0
24 Nov 2021
Hierarchical Modular Network for Video Captioning
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
27
67
0
24 Nov 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLM
VLM
34
246
0
24 Nov 2021
Multiset-Equivariant Set Prediction with Approximate Implicit
  Differentiation
Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation
Yan Zhang
David W. Zhang
Simon Lacoste-Julien
Gertjan J. Burghouts
Cees G. M. Snoek
BDL
48
21
0
23 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language
  Modeling
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
31
111
0
23 Nov 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
75
882
0
22 Nov 2021
Class-agnostic Object Detection with Multi-modal Transformer
Class-agnostic Object Detection with Multi-modal Transformer
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
Rao Muhammad Anwer
Ming-Hsuan Yang
23
91
0
22 Nov 2021
Many Heads but One Brain: Fusion Brain -- a Competition and a Single
  Multimodal Multitask Architecture
Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture
Daria Bakshandaeva
Denis Dimitrov
V.Ya. Arkhipkin
Alex Shonenkov
M. Potanin
...
Mikhail Martynov
Anton Voronov
Vera Davydova
E. Tutubalina
Aleksandr Petiushko
45
0
0
22 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
29
57
0
19 Nov 2021
Learning to Compose Visual Relations
Learning to Compose Visual Relations
Nan Liu
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
CoGe
OCL
32
77
0
17 Nov 2021
Transparent Human Evaluation for Image Captioning
Transparent Human Evaluation for Image Captioning
Jungo Kasai
Keisuke Sakaguchi
Lavinia Dunagan
Jacob Morrison
Ronan Le Bras
Yejin Choi
Noah A. Smith
33
47
0
17 Nov 2021
Achieving Human Parity on Visual Question Answering
Achieving Human Parity on Visual Question Answering
Ming Yan
Haiyang Xu
Chenliang Li
Junfeng Tian
Bin Bi
...
Ji Zhang
Songfang Huang
Fei Huang
Luo Si
Rong Jin
32
12
0
17 Nov 2021
Explainable Semantic Space by Grounding Language to Vision with
  Cross-Modal Contrastive Learning
Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning
Yizhen Zhang
Minkyu Choi
Kuan Han
Zhongming Liu
VLM
23
15
0
13 Nov 2021
Visual Intelligence through Human Interaction
Visual Intelligence through Human Interaction
Ranjay Krishna
Mitchell L. Gordon
Fei-Fei Li
Michael S. Bernstein
29
8
0
12 Nov 2021
Full Characterization of Adaptively Strong Majority Voting in
  Crowdsourcing
Full Characterization of Adaptively Strong Majority Voting in Crowdsourcing
M. Boyarskaya
Panagiotis G. Ipeirotis
18
0
0
11 Nov 2021
A Survey of Visual Transformers
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
79
332
0
11 Nov 2021
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior
  Perspective
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective
Zhengzhuo Xu
Zenghao Chai
Chun Yuan
70
52
0
06 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language
  Transformers
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
38
369
0
03 Nov 2021
Perceptual Score: What Data Modalities Does Your Model Perceive?
Perceptual Score: What Data Modalities Does Your Model Perceive?
Itai Gat
Idan Schwartz
Alex Schwing
44
30
0
27 Oct 2021
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language
  Navigation
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
A. Moudgil
Arjun Majumdar
Harsh Agrawal
Stefan Lee
Dhruv Batra
LM&Ro
27
57
0
27 Oct 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual
  Language Reasoning
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
41
184
0
25 Oct 2021
Single-Modal Entropy based Active Learning for Visual Question Answering
Single-Modal Entropy based Active Learning for Visual Question Answering
Dong-Jin Kim
Jae-Won Cho
Jinsoo Choi
Yunjae Jung
In So Kweon
27
12
0
21 Oct 2021
StructFormer: Learning Spatial Structure for Language-Guided Semantic
  Rearrangement of Novel Objects
StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects
Weiyu Liu
Chris Paxton
Tucker Hermans
Dieter Fox
40
92
0
19 Oct 2021
Towards Optimal Correlational Object Search
Towards Optimal Correlational Object Search
Kaiyu Zheng
Rohan Chitnis
Yoonchang Sung
George Konidaris
Stefanie Tellex
21
20
0
19 Oct 2021
A Picture is Worth a Thousand Words: A Unified System for Diverse
  Captions and Rich Images Generation
A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Yupan Huang
Bei Liu
Jianlong Fu
Yutong Lu
DiffM
27
5
0
19 Oct 2021
Unifying Multimodal Transformer for Bi-directional Image and Text
  Generation
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang
Hongwei Xue
Bei Liu
Yutong Lu
19
57
0
19 Oct 2021
Self-Annotated Training for Controllable Image Captioning
Self-Annotated Training for Controllable Image Captioning
Zhangzi Zhu
Tianlei Wang
Hong Qu
29
2
0
16 Oct 2021
Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal
  Frames
Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal Frames
Delfina Sol Martinez Pandiani
Valentina Presutti
24
4
0
14 Oct 2021
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy
  Labels
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels
Mohit Sharma
Rajkumar Patra
Harshali Desai
Shruti Vyas
Yogesh S Rawat
R. Shah
VGen
NoLa
24
3
0
13 Oct 2021
Topic Scene Graph Generation by Attention Distillation from Caption
Topic Scene Graph Generation by Attention Distillation from Caption
Wenbin Wang
R. Wang
X. Chen
DiffM
30
14
0
12 Oct 2021
Accessible Visualization via Natural Language Descriptions: A Four-Level
  Model of Semantic Content
Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic Content
Alan Lundgard
Arvind Satyanarayan
25
128
0
08 Oct 2021
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real
  Images
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Zhuowan Li
Elias Stengel-Eskin
Yixiao Zhang
Cihang Xie
Q. Tran
Benjamin Van Durme
Alan Yuille
VLM
26
15
0
01 Oct 2021
Grounding Predicates through Actions
Grounding Predicates through Actions
Toki Migimatsu
Jeannette Bohg
150
33
0
29 Sep 2021
Multimodal Integration of Human-Like Attention in Visual Question
  Answering
Multimodal Integration of Human-Like Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Philippe Muller
Dominike Thomas
Mihai Bâce
Andreas Bulling
41
16
0
27 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual
  Question Answering
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
40
19
0
27 Sep 2021
OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset
  with Visual Contexts
OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts
Shuhe Wang
Yuxian Meng
Xiaoya Li
Xiaofei Sun
Rongbin Ouyang
Jiwei Li
MLLM
VLM
32
21
0
27 Sep 2021
MLIM: Vision-and-Language Model Pre-training with Masked Language and
  Image Modeling
MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling
Tarik Arici
M. S. Seyfioglu
T. Neiman
Yi Tian Xu
Son N. Tran
Trishul Chilimbi
Belinda Zeng
Ismail B. Tutar
VLM
16
15
0
24 Sep 2021
Visual Scene Graphs for Audio Source Separation
Visual Scene Graphs for Audio Source Separation
Moitreya Chatterjee
Jonathan Le Roux
Narendra Ahuja
A. Cherian
26
36
0
24 Sep 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
211
221
0
24 Sep 2021
Dense Contrastive Visual-Linguistic Pretraining
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLM
SSL
57
10
0
24 Sep 2021
Scene Graph Generation for Better Image Captioning?
Scene Graph Generation for Better Image Captioning?
Maximilian Mozes
Martin Schmitt
Vladimir Golkov
Hinrich Schütze
Daniel Cremers
GNN
31
3
0
23 Sep 2021
WRENCH: A Comprehensive Benchmark for Weak Supervision
WRENCH: A Comprehensive Benchmark for Weak Supervision
Jieyu Zhang
Yue Yu
Yinghao Li
Yujing Wang
Yaming Yang
Mao Yang
Alexander Ratner
22
111
0
23 Sep 2021
Previous
123...121314...222324
Next