Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.21534
Cited By
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
31 July 2024
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models"
13 / 13 papers shown
Title
Exploring Multimodal Prompt for Visualization Authoring with Large Language Models
Zhen Wen
Luoxuan Weng
Yinghao Tang
Runjin Zhang
Y. Liu
Bo Pan
Minfeng Zhu
Wei Chen
LRM
19
0
0
18 Apr 2025
Towards Online Multi-Modal Social Interaction Understanding
X. Li
Shijian Deng
Bolin Lai
Weiguo Pian
James M. Rehg
Yapeng Tian
46
0
0
25 Mar 2025
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
H. Zhang
Tat-Seng Chua
Shuicheng Yan
64
37
0
31 Dec 2024
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Yu Zhao
Hao Fei
Xiangtai Li
L. Qin
Jiayi Ji
Hongyuan Zhu
Meishan Zhang
M. Zhang
Jianguo Wei
DiffM
26
1
0
20 Oct 2024
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang
Yinpeng Dong
Siyuan Zhang
Tianzan Min
Hang Su
Jun Zhu
LRM
VLM
44
5
0
17 Apr 2024
Cross-Modality Perturbation Synergy Attack for Person Re-identification
Yunpeng Gong
Zhun Zhong
Zhiming Luo
Yansong Qu
Rongrong Ji
Min Jiang
AAML
27
19
0
18 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
164
922
0
21 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
87
68
0
05 Dec 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
118
375
0
07 Nov 2023
Training-Free Layout Control with Cross-Attention Guidance
Minghao Chen
Iro Laina
Andrea Vedaldi
DiffM
132
221
0
06 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
267
4,229
0
30 Jan 2023
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
Manli Shu
Weili Nie
De-An Huang
Zhiding Yu
Tom Goldstein
Anima Anandkumar
Chaowei Xiao
VLM
VPVLM
183
280
0
15 Sep 2022
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
185
515
0
26 Jan 2016
1