ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXiv (abs)PDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,644 papers shown
Title
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Dapeng Zhao
Yue Qi
3DHCVBM3DV
103
1
0
31 Dec 2024
Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
Junxiao Xue
Quan Deng
Fei Yu
Yanhao Wang
Jun Wang
Yongqian Li
VLM
129
5
0
31 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
135
0
0
18 Dec 2024
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual
  Knowledge
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
Yaqi Zhao
Yuanyang Yin
Lin Li
Mingan Lin
Victor Shea-Jay Huang
Siwei Chen
Xin Wu
Baoqun Yin
Guosheng Dong
Wentao Zhang
136
1
0
25 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
247
20
0
25 Nov 2024
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Rohith Peddi
Saurabh
Ayush Abhay Shrivastava
Parag Singla
Vibhav Gogate
161
2
0
20 Nov 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Sunil Aryal
Imran Razzak
Hakim Hacid
87
0
0
30 Oct 2024
Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy
Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy
Ya-Wei Eileen Lin
Ronald R. Coifman
Zhengchao Wan
Ronen Talmon
197
3
0
28 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
126
1
0
26 Oct 2024
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
Deeparghya Dutta Barua
Md Sakib Ul Rahman Sourove
Md Fahim
Fabiha Haider
Fariha Tanjim Shifat
Md Tasmim Rahman Adib
Anam Borhan Uddin
Md Farhan Ishmam
Md Farhad Alam
81
0
0
19 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAMLCoGeVLM
222
31
0
18 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
120
9
0
16 Oct 2024
Overcoming Domain Limitations in Open-vocabulary Segmentation
Overcoming Domain Limitations in Open-vocabulary Segmentation
Dongjun Hwang
Seong Joon Oh
Junsuk Choe
SSegOOD
141
0
0
15 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
267
7
0
14 Oct 2024
Can We Predict Performance of Large Models across Vision-Language Tasks?
Can We Predict Performance of Large Models across Vision-Language Tasks?
Qinyu Zhao
Ming Xu
Kartik Gupta
Akshay Asthana
Liang Zheng
Stephen Gould
128
0
0
14 Oct 2024
Declarative Knowledge Distillation from Large Language Models for Visual
  Question Answering Datasets
Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets
Thomas Eiter
Jan Hadl
N. Higuera
J. Oetsch
51
0
0
12 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
124
7
0
10 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLMMLLM
163
34
0
10 Oct 2024
HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
Xinyu Zhou
Simin Fan
Martin Jaggi
TDI
96
1
0
07 Oct 2024
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Deqing Fu
Tong Xiao
Rui Wang
Wang Zhu
Pengchuan Zhang
Guan Pang
Robin Jia
Lawrence Chen
160
7
0
07 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
219
37
0
04 Oct 2024
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
Hong Li
Nanxi Li
Yuanjie Chen
Jianbin Zhu
Qinlu Guo
Cewu Lu
Yong-Lu Li
MLLM
111
1
0
02 Oct 2024
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving
  Fine-Grained Zero-Shot Image Captioning
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
Joshua Forster Feinglass
Yezhou Yang
63
0
0
30 Sep 2024
ComiCap: A VLMs pipeline for dense captioning of Comic Panels
ComiCap: A VLMs pipeline for dense captioning of Comic Panels
Emanuele Vivoli
Niccoló Biondi
Marco Bertini
Dimosthenis Karatzas
73
4
0
24 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLMDiffM
192
14
0
23 Sep 2024
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs
Bowen Yan
Zhengsong Zhang
Liqiang Jing
Eftekhar Hossain
Xinya Du
118
3
0
20 Sep 2024
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
130
4
0
16 Sep 2024
What Makes a Maze Look Like a Maze?
What Makes a Maze Look Like a Maze?
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
130
6
0
12 Sep 2024
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
Junyao Ge
Xu Zhang
Yang Zheng
Kaitai Guo
Jimin Liang
171
2
0
27 Aug 2024
ParGo: Bridging Vision-Language with Partial and Global Views
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang
Bin Shan
Wei Shi
Kun-Yu Lin
Xiang Fei
Guozhi Tang
Lei Liao
Jingqun Tang
Can Huang
Wei-Shi Zheng
MLLMVLM
185
17
0
23 Aug 2024
Towards Deconfounded Image-Text Matching with Causal Inference
Towards Deconfounded Image-Text Matching with Causal Inference
Wenhui Li
Xinqi Su
Dan Song
Lanjun Wang
Kun Zhang
An-An Liu
BDLCML
87
11
0
22 Aug 2024
RConE: Rough Cone Embedding for Multi-Hop Logical Query Answering on Multi-Modal Knowledge Graphs
RConE: Rough Cone Embedding for Multi-Hop Logical Query Answering on Multi-Modal Knowledge Graphs
Mayank Kharbanda
R. Shah
Raghava Mutharaju
87
0
0
21 Aug 2024
NAVERO: Unlocking Fine-Grained Semantics for Video-Language
  Compositionality
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Chaofan Tao
Gukyeong Kwon
Varad Gunjal
Hao Yang
Zhaowei Cai
Yonatan Dukler
Ashwin Swaminathan
R. Manmatha
Colin Jon Taylor
Stefano Soatto
CoGe
63
0
0
18 Aug 2024
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models
Kening Zheng
Junkai Chen
Yibo Yan
Xin Zou
Xuming Hu
229
7
0
18 Aug 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
108
96
0
16 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
192
8
0
13 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLMMLLM
210
31
0
04 Aug 2024
BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation
BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation
Peng Hao
Xiaobing Wang
Yingying Jiang
Hanchao Jia
Xiaoshuai Hao
Shaowei Cui
Junhang Wei
Xiaoshuai Hao
150
3
0
26 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul Chilimbi
Mubarak Shah
112
2
0
18 Jul 2024
Relation DETR: Exploring Explicit Position Relation Prior for Object
  Detection
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
Xiuquan Hou
Mei-qin Liu
Senlin Zhang
Ping Wei
Badong Chen
Xuguang Lan
ViT
100
17
0
16 Jul 2024
Reflective Instruction Tuning: Mitigating Hallucinations in Large
  Vision-Language Models
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang
Teng Wang
Haigang Zhang
Ping Lu
Feng Zheng
MLLMLRMVLM
90
4
0
16 Jul 2024
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer
  from Text to Image via CLIP Inversion
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
Philipp Allgeuer
Kyra Ahrens
Stefan Wermter
CLIPVLM
92
3
0
15 Jul 2024
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Zijie Yue
Miaojing Shi
Hanli Wang
Shuai Ding
Qijun Chen
Shanlin Yang
107
0
0
11 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
150
5
0
09 Jul 2024
SADL: An Effective In-Context Learning Method for Compositional Visual
  QA
SADL: An Effective In-Context Learning Method for Compositional Visual QA
Long Hoang Dang
T. Le
Vuong Le
Tu Minh Phuong
Truyen Tran
ReLMCoGe
99
3
0
02 Jul 2024
Curriculum Learning with Quality-Driven Data Selection
Curriculum Learning with Quality-Driven Data Selection
Biao Wu
Fang Meng
113
2
0
27 Jun 2024
Composing Object Relations and Attributes for Image-Text Matching
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham
Chuong Huynh
Ser-Nam Lim
Abhinav Shrivastava
CoGe
77
8
0
17 Jun 2024
Object-Attribute-Relation Representation Based Video Semantic Communication
Object-Attribute-Relation Representation Based Video Semantic Communication
Qiyuan Du
Yiping Duan
Qianqian Yang
Xiaoming Tao
Mérouane Debbah
128
3
0
15 Jun 2024
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for
  Remote Sensing Vision-Language Understanding
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
Junwei Luo
Zhen Pang
Yongjun Zhang
Tingzhu Wang
Linlin Wang
...
Jiangwei Lao
Jian Wang
Jingdong Chen
Yihua Tan
Yansheng Li
132
27
0
14 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
169
3
0
13 Jun 2024
Previous
123456...313233
Next