Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.17048
Cited By
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
28 November 2023
Zeyu Han
Fangrui Zhu
Qianru Lao
Huaizu Jiang
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions"
13 / 13 papers shown
Title
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
96
2
0
08 Mar 2025
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
Yishuo Wang
Jingchen Ni
Yong-Jin Liu
Chun Yuan
Yansong Tang
58
1
0
02 Mar 2025
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
67
4
0
31 Dec 2024
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
227
0
0
01 Dec 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu
Linchao Zhu
Yi Yang
32
3
0
16 Oct 2024
Towards Flexible Visual Relationship Segmentation
Fangrui Zhu
Jianwei Yang
Huaizu Jiang
VOS
34
1
0
15 Aug 2024
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
36
6
0
18 Jul 2024
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Bin Ren
Yawei Li
Nancy Mehta
Radu Timofte
Hongyuan Yu
...
P. Yashaswini
Chaitra Desai
R. Tabib
Ujwala Patil
U. Mudenagudi
SupR
52
35
0
16 Apr 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
150
319
0
21 Mar 2024
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
YiXuan Wu
Yizhou Wang
Shixiang Tang
Wenhao Wu
Tong He
Wanli Ouyang
Jian Wu
Philip Torr
ObjD
VLM
32
19
0
19 Mar 2024
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
211
221
0
24 Sep 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
255
0
14 Jul 2021
Improving Distantly-Supervised Relation Extraction through BERT-based Label & Instance Embeddings
Despina Christou
Grigorios Tsoumakas
29
39
0
01 Feb 2021
1