Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04356
Cited By
v1
v2 (latest)
Fine-Grained Visual Prompting
7 June 2023
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (52★)
Papers citing
"Fine-Grained Visual Prompting"
15 / 15 papers shown
Title
AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models
Yuan Zhang
Chun-Kai Fan
Tao Huang
Ming Lu
Sicheng Yu
Junwen Pan
Kuan Cheng
Qi She
Shanghang Zhang
VLM
LRM
34
0
0
19 Jun 2025
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling
Liang Yin
Xudong Xie
Zhang Li
Xiang Bai
Yuliang Liu
LRM
129
0
0
12 Jun 2025
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Ye Mo
Zirui Shao
Kai Ye
Xianwei Mao
Bo Zhang
...
Gang Huang
Kehan Chen
Zhou Huan
Zixu Yan
Sheng Zhou
LRM
62
0
0
24 May 2025
Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models
Lucas Choi
Ross Greer
VLM
144
0
0
14 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
116
1
0
03 May 2025
Visual and Textual Prompts in VLLMs for Enhancing Emotion Recognition
Zhifeng Wang
Qixuan Zhang
Peter Zhang
Wenjia Niu
Kaihao Zhang
Ramesh Sankaranarayana
Sabrina Caldwell
Tom Gedeon
94
0
0
24 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
250
0
0
22 Apr 2025
Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models
Zhaoxin Li
Zhang Xi-Jia
Batuhan Altundas
Letian Chen
Rohan R. Paleja
Matthew C. Gombolay
OffRL
78
0
0
20 Mar 2025
Visual Position Prompt for MLLM based Visual Grounding
Wei Tang
Yanpeng Sun
Qinying Gu
Zechao Li
VLM
100
0
0
19 Mar 2025
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li
Renping Zhou
Jiawei Zhou
Yingwei Song
Johannes Herter
Minghan Qin
Gao Huang
Hanspeter Pfister
3DGS
VLM
161
3
0
13 Mar 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
LRM
173
4
0
24 Feb 2025
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
162
13
0
31 Jul 2024
Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Changdae Oh
Gyeongdeok Seo
Geunyoung Jung
Zhi-Qi Cheng
Hosik Choi
Jiyoung Jung
Kyungwoo Song
VLM
133
1
0
04 Jul 2024
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu
Sheng-Yu Huang
Yu-Chiang Frank Wang
147
0
0
25 Mar 2024
Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model
Huan Ma
Yan Zhu
Changqing Zhang
Peilin Zhao
Baoyuan Wu
Long-Kai Huang
Qinghua Hu
Bing Wu
VLM
167
2
0
01 Mar 2024
1