Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.08360
Cited By
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks
13 February 2024
Jusung Lee
Sungguk Cha
Younghyun Lee
Cheoljong Yang
MLLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks"
6 / 6 papers shown
Title
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
Zhiyuan Liu
Yuting Zhang
Feng Liu
Changwang Zhang
Ying Sun
Jun Wang
LRM
74
4
0
20 Mar 2025
Out-of-Distribution Radar Detection in Compound Clutter and Thermal Noise through Variational Autoencoders
Y A Rouzoumka
E Terreaux
C. Morisseau
J. Ovarlez
C. Ren
59
0
0
06 Mar 2025
Reducing Hallucinations in Vision-Language Models via Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Zou
VLM
LLMSV
58
5
0
21 Oct 2024
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
Zonghao Ying
Aishan Liu
Tianyuan Zhang
Zhengmin Yu
Siyuan Liang
Xianglong Liu
Dacheng Tao
AAML
46
26
0
06 Jun 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
322
4,300
0
30 Jan 2023
LAVIS: A Library for Language-Vision Intelligence
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
Guosheng Lin
VLM
131
52
0
15 Sep 2022
1