ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXivPDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,034 papers shown
Title
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
96
6
0
25 Nov 2024
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Rohith Peddi
Saurabh
Ayush Abhay Shrivastava
Parag Singla
Vibhav Gogate
82
0
0
20 Nov 2024
Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy
Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy
Y. Lin
Ronald R. Coifman
Gal Mishne
Ronen Talmon
40
2
0
28 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
43
0
0
26 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAML
CoGe
VLM
71
21
0
18 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
72
4
0
14 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
70
25
0
10 Oct 2024
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and
  Open-Vocabulary Semantic Scene Graphs
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs
Venkata Naren Devarakonda
Raktim Gautam Goswami
Ali Umut Kaypak
Naman Patel
Rooholla Khorrambakht
Prashanth Krishnamurthy
Farshad Khorrami
LM&Ro
39
3
0
08 Oct 2024
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Deqing Fu
Tong Xiao
Rui Wang
Wang Zhu
Pengchuan Zhang
Guan Pang
Robin Jia
Lawrence Chen
68
5
0
07 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
89
26
0
04 Oct 2024
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
Hong Li
Nanxi Li
Yuanjie Chen
Jianbin Zhu
Qinlu Guo
Cewu Lu
Yong-Lu Li
MLLM
39
1
0
02 Oct 2024
Advancing Video Quality Assessment for AIGC
Advancing Video Quality Assessment for AIGC
Xinli Yue
Jianhui Sun
Han Kong
Liangchao Yao
Tianyi Wang
...
Jing Lv
Fan Xia
Yuetang Deng
Qian Wang
Lingchen Zhao
VGen
EGVM
26
0
0
23 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
63
10
0
23 Sep 2024
FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs
FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs
Jing Hao
Yuxiang Zhao
Song Chen
Yanpeng Sun
Qiang Chen
Gang Zhang
Kun Yao
Errui Ding
Jingdong Wang
VLM
VGen
MLLM
51
5
0
20 Sep 2024
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
56
3
0
16 Sep 2024
Reasoning Paths with Reference Objects Elicit Quantitative Spatial
  Reasoning in Large Vision-Language Models
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
Yuan-Hong Liao
Rafid Mahmood
Sanja Fidler
David Acuna
ReLM
LRM
39
10
0
15 Sep 2024
ComAlign: Compositional Alignment in Vision-Language Models
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
30
0
0
12 Sep 2024
What Makes a Maze Look Like a Maze?
What Makes a Maze Look Like a Maze?
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
59
6
0
12 Sep 2024
Securing Vision-Language Models with a Robust Encoder Against Jailbreak
  and Adversarial Attacks
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
Md Zarif Hossain
Ahmed Imteaj
AAML
VLM
48
3
0
11 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring
  Expression Segmentation
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
42
11
0
01 Sep 2024
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
Junyao Ge
Yang Zheng
Kaitai Guo
Jimin Liang
Jimin Liang
41
1
0
27 Aug 2024
Towards Deconfounded Image-Text Matching with Causal Inference
Towards Deconfounded Image-Text Matching with Causal Inference
Wenhui Li
Xinqi Su
Dan Song
Lanjun Wang
Kun Zhang
An-An Liu
BDL
CML
53
10
0
22 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
72
6
0
13 Aug 2024
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language
  Models
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
VLM
40
5
0
08 Aug 2024
Attacks and Defenses for Generative Diffusion Models: A Comprehensive
  Survey
Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
V. T. Truong
Luan Ba Dang
Long Bao Le
DiffM
MedIm
56
16
0
06 Aug 2024
Modelling Visual Semantics via Image Captioning to extract Enhanced
  Multi-Level Cross-Modal Semantic Incongruity Representation with Attention
  for Multimodal Sarcasm Detection
Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection
Sajal Aggarwal
Ananya Pandey
Dinesh Kumar Vishwakarma
43
1
0
05 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLM
MLLM
66
19
0
04 Aug 2024
BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation
BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation
Peng Hao
Xiaobing Wang
Yingying Jiang
Hanchao Jia
Xiaoshuai Hao
Shaowei Cui
Junhang Wei
Xiaoshuai Hao
57
3
0
26 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
45
9
0
22 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
36
5
0
18 Jul 2024
Reflective Instruction Tuning: Mitigating Hallucinations in Large
  Vision-Language Models
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang
Teng Wang
Haigang Zhang
Ping Lu
Feng Zheng
MLLM
LRM
VLM
39
3
0
16 Jul 2024
A Fair Ranking and New Model for Panoptic Scene Graph Generation
A Fair Ranking and New Model for Panoptic Scene Graph Generation
Julian Lorenz
Alexander Pest
Daniel Kienzle
K. Ludwig
Rainer Lienhart
49
1
0
12 Jul 2024
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Zijie Yue
Miaojing Shi
Hanli Wang
Shuai Ding
Qijun Chen
Shanlin Yang
45
0
0
11 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
66
4
0
09 Jul 2024
Curriculum Learning with Quality-Driven Data Selection
Curriculum Learning with Quality-Driven Data Selection
Biao Wu
Fang Meng
Ling-Hao Chen
36
2
0
27 Jun 2024
On Efficient Language and Vision Assistants for Visually-Situated
  Natural Language Understanding: What Matters in Reading and Reasoning
On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
Geewook Kim
Minjoon Seo
VLM
44
2
0
17 Jun 2024
Composing Object Relations and Attributes for Image-Text Matching
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham
Chuong Huynh
Ser-Nam Lim
Abhinav Shrivastava
CoGe
44
4
0
17 Jun 2024
Object-Attribute-Relation Representation Based Video Semantic Communication
Object-Attribute-Relation Representation Based Video Semantic Communication
Qiyuan Du
Yiping Duan
Qianqian Yang
Xiaoming Tao
Mérouane Debbah
66
2
0
15 Jun 2024
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for
  Remote Sensing Vision-Language Understanding
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
Junwei Luo
Zhen Pang
Yongjun Zhang
Tingzhu Wang
Linlin Wang
...
Jiangwei Lao
Jian Wang
Jingdong Chen
Yihua Tan
Yansheng Li
48
23
0
14 Jun 2024
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
Hanqing Wang
Zeguan Xiao
Shuo Wang
Guanhua Chen
Guanhua Chen
44
19
0
13 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
44
35
0
12 Jun 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal
  Large Language Models
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
50
13
0
11 Jun 2024
AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video
  Grounding
AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Xing Zhang
Jiaxi Gu
Haoyu Zhao
Shicong Wang
Hang Xu
Renjing Pei
Songcen Xu
Zuxuan Wu
Yu-Gang Jiang
46
0
0
11 Jun 2024
Situated Ground Truths: Enhancing Bias-Aware AI by Situating Data Labels
  with SituAnnotate
Situated Ground Truths: Enhancing Bias-Aware AI by Situating Data Labels with SituAnnotate
Delfina Sol Martinez Pandiani
Valentina Presutti
24
1
0
10 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
80
12
0
09 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
65
32
0
07 Jun 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and
  Effective for LMMs
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu-Gang Jiang
VLM
30
9
0
06 Jun 2024
Multi-Modal Generative Embedding Model
Multi-Modal Generative Embedding Model
Feipeng Ma
Hongwei Xue
Guangting Wang
Yizhou Zhou
Fengyun Rao
Shilin Yan
Yueyi Zhang
Siying Wu
Mike Zheng Shou
Xiaoyan Sun
VLM
39
3
0
29 May 2024
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
Guan-Bo Wang
Zhiming Li
Qingchao Chen
Yang Liu
43
9
0
27 May 2024
Previous
12345...192021
Next