ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXiv (abs)PDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 1,645 papers shown
Title
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
169
3
0
13 Jun 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal
  Large Language Models
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
89
19
0
11 Jun 2024
Needle In A Multimodal Haystack
Needle In A Multimodal Haystack
Weiyun Wang
Shuibo Zhang
Yiming Ren
Yuchen Duan
Tiantong Li
...
Ping Luo
Yu Qiao
Jifeng Dai
Wenqi Shao
Wenhai Wang
VLM
114
24
0
11 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
184
15
0
09 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
157
37
0
07 Jun 2024
WebUOT-1M: Advancing Deep Underwater Object Tracking with A
  Million-Scale Benchmark
WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark
Chunhui Zhang
Li Liu
Guanjie Huang
Hao Wen
Xi Zhou
Yanfeng Wang
131
9
0
30 May 2024
Multi-Modal Generative Embedding Model
Multi-Modal Generative Embedding Model
Feipeng Ma
Hongwei Xue
Guangting Wang
Yizhou Zhou
Fengyun Rao
Shilin Yan
Yueyi Zhang
Siying Wu
Mike Zheng Shou
Xiaoyan Sun
VLM
69
4
0
29 May 2024
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning
Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning
Neha Kalibhat
Priyatham Kattakinda
Arman Zarei
Nikita Seleznev
Sam Sharpe
Samuel Sharpe
Senthil Kumar
Soheil Feizi
ViT
73
0
0
26 May 2024
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
...
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
VLMDiffM
188
22
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
335
54
0
23 May 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi-dong Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Hao Liu
Xiang Bai
Can Huang
Xiang Bai
Can Huang
187
28
0
20 May 2024
"Set It Up!": Functional Object Arrangement with Compositional Generative Models
"Set It Up!": Functional Object Arrangement with Compositional Generative Models
Yiqing Xu
Jiayuan Mao
Yilun Du
Tomás Lozano-Pérez
L. Kaelbling
David Hsu
LM&Ro
175
6
0
20 May 2024
Efficient Multimodal Large Language Models: A Survey
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
113
58
0
17 May 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B. Tenenbaum
Chuang Gan
157
196
0
15 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
166
18
0
08 May 2024
POV Learning: Individual Alignment of Multimodal Models using Human Perception
POV Learning: Individual Alignment of Multimodal Models using Human Perception
Simon Werner
Katharina Christ
Laura Bernardy
Marion G. Müller
Achim Rettinger
33
0
0
07 May 2024
DOCCI: Descriptions of Connected and Contrasting Images
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe
Sunayana Rane
Zachary Berger
Yonatan Bitton
Jaemin Cho
...
Zarana Parekh
Jordi Pont-Tuset
Garrett Tanzer
Su Wang
Jason Baldridge
117
63
0
30 Apr 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLMLRM
258
197
0
29 Apr 2024
3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial
  Self-Highlighting
3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Xuri Ge
Songpei Xu
Fuhai Chen
Jie Wang
Guoxin Wang
Shan An
Joemon M. Jose
3DPC
110
12
0
26 Apr 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLMVLM
159
644
0
25 Apr 2024
Multi-Head Mixture-of-Experts
Multi-Head Mixture-of-Experts
Xun Wu
Shaohan Huang
Wenhui Wang
Furu Wei
MoE
97
15
0
23 Apr 2024
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
Wenyi Xiao
Ziwei Huang
Leilei Gan
Wanggui He
Haoyuan Li
Zhelun Yu
Hao Jiang
Leilei Gan
Linchao Zhu
MLLM
104
34
0
22 Apr 2024
ECOR: Explainable CLIP for Object Recognition
ECOR: Explainable CLIP for Object Recognition
Ali Rasekh
Sepehr Kazemi Ranjbar
Milad Heidari
Wolfgang Nejdl
VLM
113
4
0
19 Apr 2024
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Quan Van Nguyen
Dan Quang Tran
Huy Quang Pham
Thang Kien-Bao Nguyen
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
CoGe
172
5
0
16 Apr 2024
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Tianyu Zhu
M. Jung
Jesse Clark
134
1
0
12 Apr 2024
AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene
  Graph Generation
AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation
Yansheng Li
Kun Li
Yongjun Zhang
Linlin Wang
Dingwen Zhang
130
3
0
11 Apr 2024
ObjBlur: A Curriculum Learning Approach With Progressive Object-Level
  Blurring for Improved Layout-to-Image Generation
ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation
Stanislav Frolov
Brian B. Moser
Sebastián M. Palacio
Andreas Dengel
DiffM
57
0
0
11 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question
  Answering
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
85
1
0
01 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
184
47
0
29 Mar 2024
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion
  for 3D Scene Graph Alignment and Its Downstream Tasks
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Yaxu Xie
A. Pagani
Didier Stricker
116
4
0
28 Mar 2024
Toward Interactive Regional Understanding in Vision-Large Language
  Models
Toward Interactive Regional Understanding in Vision-Large Language Models
Jungbeom Lee
Sanghyuk Chun
Sangdoo Yun
VLM
82
3
0
27 Mar 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
246
12
0
25 Mar 2024
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao
Min Zhang
Wei Zhao
Pengxiang Ding
Siteng Huang
Donglin Wang
Mamba
119
74
0
21 Mar 2024
Hierarchical Spatial Proximity Reasoning for Vision-and-Language
  Navigation
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
Ming Xu
Zilong Xie
100
2
0
18 Mar 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling
  and Visual-Language Co-Referring
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
ObjD
98
14
0
14 Mar 2024
VLM-PL: Advanced Pseudo Labeling Approach for Class Incremental Object
  Detection via Vision-Language Model
VLM-PL: Advanced Pseudo Labeling Approach for Class Incremental Object Detection via Vision-Language Model
Junsu Kim
Yunhoe Ku
Jihyeon Kim
Junuk Cha
Seungryul Baek
ObjDVLM
97
14
0
08 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLMLRM
317
576
0
07 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
86
15
0
06 Mar 2024
How to Understand "Support"? An Implicit-enhanced Causal Inference
  Approach for Weakly-supervised Phrase Grounding
How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding
Jiamin Luo
Jianing Zhao
Jingjing Wang
Guodong Zhou
64
0
0
29 Feb 2024
Grounding Language Models for Visual Entity Recognition
Grounding Language Models for Visual Entity Recognition
Zilin Xiao
Ming Gong
Paola Cascante-Bonilla
Xingyao Zhang
Jie Wu
Vicente Ordonez
VLM
97
10
0
28 Feb 2024
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song
Mengyue Wu
Ke Zhu
Chunhao Zhang
Yanyi Chen
LRMELM
134
3
0
28 Feb 2024
VCD: A Dataset for Visual Commonsense Discovery in Images
VCD: A Dataset for Visual Commonsense Discovery in Images
Xiangqing Shen
Yurun Song
Siwei Wu
Rui Xia
113
6
0
27 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRMVLM
135
64
0
19 Feb 2024
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for
  Medical LVLM
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu
Tian-Xin Li
Quanfeng Lu
Wenqi Shao
Junjun He
Yu Qiao
Ping Luo
ELMLM&MA
87
67
0
14 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLMMLLM
59
12
0
13 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
241
116
0
08 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
175
7
0
08 Feb 2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
Siwei Wu
Yizhi Li
Kang Zhu
Ge Zhang
Yiming Liang
...
Wenhu Chen
Wenhao Huang
Noura Al Moubayed
Jie Fu
Chenghua Lin
98
13
0
24 Jan 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document
  Understanding with Instructions
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
72
23
0
24 Jan 2024
Text-to-Image Cross-Modal Generation: A Systematic Review
Text-to-Image Cross-Modal Generation: A Systematic Review
Maciej Żelaszczyk
Jacek Mańdziuk
97
4
0
21 Jan 2024
Previous
12345...313233
Next