ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXiv (abs)PDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,644 papers shown
Title
Generating Directed Graphs with Dual Attention and Asymmetric Encoding
Generating Directed Graphs with Dual Attention and Asymmetric Encoding
Alba Carballo-Castro
Manuel Madeira
Yiming Qin
D. Thanou
Pascal Frossard
19
0
0
19 Jun 2025
AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models
AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models
Yuan Zhang
Chun-Kai Fan
Tao Huang
Ming Lu
Sicheng Yu
Junwen Pan
Kuan Cheng
Qi She
Shanghang Zhang
VLMLRM
19
0
0
19 Jun 2025
A Spatial Relationship Aware Dataset for Robotics
A Spatial Relationship Aware Dataset for Robotics
Peng Wang
Minh Huy Pham
Zhihao Guo
Wei Zhou
3DPC
7
0
0
14 Jun 2025
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Xiao Xu
L. Qin
Wanxiang Che
Min-Yen Kan
MoEVLM
34
0
0
13 Jun 2025
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Ning Gao
Yilun Chen
Shuai Yang
Xinyi Chen
Yang Tian
Hao Li
Haifeng Huang
Hanqing Wang
Tai Wang
Jiangmiao Pang
LM&Ro
129
0
0
12 Jun 2025
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
Eunkyu Park
Minyeong Kim
Gunhee Kim
MLLMHILMVLM
140
0
0
12 Jun 2025
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang
Mengzhen Liu
Lichen Li
Ming Lu
Yuan Zhang
Junwen Pan
Qi She
Shanghang Zhang
VLM
122
0
0
12 Jun 2025
AIR: Zero-shot Generative Model Adaptation with Iterative Refinement
AIR: Zero-shot Generative Model Adaptation with Iterative Refinement
Guimeng Liu
Milad Abdollahzadeh
Ngai-Man Cheung
VLM
119
0
0
12 Jun 2025
Vision Generalist Model: A Survey
Vision Generalist Model: A Survey
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
68
0
0
11 Jun 2025
Open World Scene Graph Generation using Vision Language Models
Amartya Dutta
Kazi Sajeed Mehrab
Medha Sawhney
Abhilash Neog
Mridul Khurana
...
Aanish Pradhan
M. Maruf
Ismini Lourentzou
Arka Daw
Anuj Karpatne
VLM
20
0
0
09 Jun 2025
Synthetic Visual Genome
Synthetic Visual Genome
J. S. Park
Zixian Ma
Linjie Li
Chenhao Zheng
Cheng-Yu Hsieh
...
Quan Kong
Norimasa Kobori
Ali Farhadi
Yejin Choi
Ranjay Krishna
21
0
0
09 Jun 2025
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Tianyi Bai
Yuxuan Fan
Jiantao Qiu
Fupeng Sun
Jiayi Song
Junlin Han
Zichen Liu
Conghui He
Wentao Zhang
Binhang Yuan
MLLMVLM
26
0
0
08 Jun 2025
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Akash Gupta
Amos Storkey
Mirella Lapata
VLM
43
0
0
07 Jun 2025
PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments
PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments
Minghao Zou
Qingtian Zeng
Yongping Miao
Shangkun Liu
Zilong Wang
Hantao Liu
Wei Zhou
22
0
0
07 Jun 2025
Hallucinate, Ground, Repeat: A Framework for Generalized Visual Relationship Detection
Hallucinate, Ground, Repeat: A Framework for Generalized Visual Relationship Detection
Shanmukha Vellamcheti
Sanjoy Kundu
Sathyanarayanan N. Aakur
53
0
0
06 Jun 2025
TextVidBench: A Benchmark for Long Video Scene Text Understanding
Yangyang Zhong
Ji Qi
Yuan Yao
Pengxin Luo
Yunfeng Yan
Donglian Qi
Zhiyuan Liu
Tat-Seng Chua
99
0
0
05 Jun 2025
CIVET: Systematic Evaluation of Understanding in VLMs
CIVET: Systematic Evaluation of Understanding in VLMs
Massimo Rizzoli
Simone Alghisi
Olha Khomyn
Gabriel Roccabruna
Seyed Mahed Mousavi
Giuseppe Riccardi
172
0
0
05 Jun 2025
Refer to Anything with Vision-Language Prompts
Shengcao Cao
Zijun Wei
Jason Kuen
Kangning Liu
Lingzhi Zhang
Jiuxiang Gu
HyunJoon Jung
Liang-Yan Gui
Yu Wang
VLM
117
0
0
05 Jun 2025
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
Yang Yao
Lingyu Li
Jiaxin Song
Chiyu Chen
Zhenqi He
...
Xin Wang
Tianle Gu
Jie Li
Yan Teng
Yingchun Wang
LRM
19
0
0
03 Jun 2025
Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos
Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos
Tanqiu Qiao
Ruochen Li
Frederick W. B. Li
Yoshiki Kubotani
Shigeo Morishima
Hubert P. H. Shum
25
0
0
03 Jun 2025
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck
Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck
Yuwen Tan
Yuan Qing
Boqing Gong
45
0
0
30 May 2025
Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport
Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport
Yuu Jinnai
OT
48
0
0
29 May 2025
A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation
A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation
Shuzhou Sun
Li Liu
Tianpeng Liu
Shuaifeng Zhi
Ming-Ming Cheng
J. Heikkilä
Yongxiang Liu
CML
245
0
0
29 May 2025
Spatial Knowledge Graph-Guided Multimodal Synthesis
Spatial Knowledge Graph-Guided Multimodal Synthesis
Yida Xue
Zhen Bi
Jinnan Yang
Jungang Lou
Ningyu Zhang
N. Zhang
57
0
0
28 May 2025
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Hao Fang
Changle Zhou
Jiawei Kong
Kuofeng Gao
Bin Chen
Tao Liang
Guojun Ma
Shu-Tao Xia
MLLM
115
0
0
26 May 2025
From Data to Modeling: Fully Open-vocabulary Scene Graph Generation
From Data to Modeling: Fully Open-vocabulary Scene Graph Generation
Zuyao Chen
Jinlin Wu
Zhen Lei
Chang Wen Chen
49
0
0
26 May 2025
LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
Dongil Yang
Minjin Kim
Sunghwan Kim
Beong-woo Kwak
Minjun Park
Jinseok Hong
Woontack Woo
Jinyoung Yeo
60
0
0
26 May 2025
Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models
Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models
Xinmiao Hu
C. Wang
Ruihe An
ChenYu Shao
Xiaojun Ye
Sheng Zhou
Liangcheng Li
MLLMLRM
63
0
0
26 May 2025
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation
Daniel Csizmadia
Andrei Codreanu
Victor Sim
Vighnesh Prabhu
Michael Lu
Kevin Zhu
Sean O'Brien
Vasu Sharma
CLIPVLM
71
0
0
25 May 2025
Reasoning Segmentation for Images and Videos: A Survey
Reasoning Segmentation for Images and Videos: A Survey
Yiqing Shen
Chenjia Li
Fei Xiong
Jeong-O Jeong
Tianpeng Wang
Michael Latman
Mathias Unberath
VOS
244
0
0
24 May 2025
ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models
ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models
Duo Li
Zuhao Yang
Shijian Lu
VLM
96
0
0
24 May 2025
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Donghwan Chi
Hyomin Kim
Yoonjin Oh
Yongjin Kim
Donghoon Lee
DaeJin Jo
Jongmin Kim
Junyeob Baek
Sungjin Ahn
Sungwoong Kim
MLLMVLM
484
0
0
23 May 2025
Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion
Jacob A. Hansen
Wei Lin
Junmo Kang
M. Jehanzeb Mirza
Hongyin Luo
Rogerio Feris
Alan Ritter
James R. Glass
Leonid Karlinsky
VLM
244
0
0
23 May 2025
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Kun-Yu Lin
Hongjun Wang
Weining Ren
Kai Han
294
0
0
22 May 2025
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Jiachen Jiang
Jinxin Zhou
Bo Peng
Xia Ning
Zhihui Zhu
102
0
0
22 May 2025
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Siting Li
Xiang Gao
Simon Shaolei Du
132
0
0
21 May 2025
TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks
TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks
Yuanze Hu
Zhaoxin Fan
Xinyu Wang
Gen Li
Ye Qiu
...
Wenjun Wu
Kejian Wu
Yifan Sun
Xiaotie Deng
Jin Song Dong
62
0
0
19 May 2025
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
Bonan li
Zicheng Zhang
Songhua Liu
Weihao Yu
Xinchao Wang
VLM
142
0
0
17 May 2025
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Shun Inadumi
Nobuhiro Ueda
Koichiro Yoshino
ObjD
80
0
0
16 May 2025
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Elodie Germani
Selin Türk Ilayda
Zeineddine Fatima
Mourad Charbel
Shadi Albarqouni
AI4CE
115
0
0
14 May 2025
Behind Maya: Building a Multilingual Vision Language Model
Behind Maya: Building a Multilingual Vision Language Model
Nahid Alam
Karthik Reddy Kanjula
Surya Guthikonda
Timothy Chung
Bala Krishna S Vegesna
...
Isha Chaturvedi
Genta Indra Winata
Ashvanth.S
Snehanshu Mukherjee
Alham Fikri Aji
MLLMVLM
78
0
0
13 May 2025
Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models
Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models
Seungjae Lee
Daniel Ekpo
Haowen Liu
Furong Huang
Abhinav Shrivastava
Jia-Bin Huang
LM&Ro
145
0
0
12 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
82
0
0
11 May 2025
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
Huajie Tan
Xiaoshuai Hao
Cheng Chi
Minglan Lin
Yaoxu Lyu
...
Yulong Ao
Yonghua Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
LM&Ro
124
0
0
06 May 2025
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing
Guiming Hardy Chen
Ehsan Aghazadeh
Xin Eric Wang
Xinya Du
135
0
0
04 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kai Zhang
Lizhuang Ma
Jiangming Wang
Jun Wang
Weinan Zhang
Wei Zhang
MQ
82
0
0
01 May 2025
InstructAttribute: Fine-grained Object Attributes editing with Instruction
InstructAttribute: Fine-grained Object Attributes editing with Instruction
Xingxi Yin
Jingfeng Zhang
Zhi Li
You Li
Yanzhe Zhang
Yin Zhang
DiffM
455
1
0
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
Jieneng Chen
LRM
127
1
0
01 May 2025
What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift
What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift
Jiamin Chang
Haoyang Li
Hammond Pearce
Ruoxi Sun
Yue Liu
Minhui Xue
85
0
0
28 Apr 2025
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Zehao Wang
Senthil Purushwalkam
Caiming Xiong
Siyang Song
Chenhui Xu
Ran Xu
173
2
0
23 Apr 2025
1234...313233
Next