Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.00685
Cited By
Vision-Language Models for Vision Tasks: A Survey
3 April 2023
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vision-Language Models for Vision Tasks: A Survey"
50 / 115 papers shown
Title
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
Zeqing Wang
Shiyuan Zhang
Chengpei Tang
Keze Wang
LRM
14
0
0
21 May 2025
Uncovering Cultural Representation Disparities in Vision-Language Models
Ram Mohan Rao Kadiyala
Siddhant Gupta
Jebish Purbey
Srishti Yadav
Alejandro Salamanca
Desmond Elliott
7
0
0
20 May 2025
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Manyu Li
Ruian He
Zixian Zhang
Weimin Tan
Bo Yan
VLM
13
0
0
16 May 2025
Vision language models have difficulty recognizing virtual objects
Tyler Tran
Sangeet Khemlani
J. Gregory Trafton
19
0
0
15 May 2025
Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
Muzammil Behzad
VLM
28
0
0
14 May 2025
Open the Eyes of MPNN: Vision Enhances MPNN in Link Prediction
Yanbin Wei
Xuehao Wang
Zhan Zhuang
Yang Chen
Shuhao Chen
Yulong Zhang
Yu-Jie Zhang
James T. Kwok
39
1
0
13 May 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
50
0
0
13 May 2025
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
Wenxuan Ma
Xiaoge Cao
Yujie Zhang
Chaofan Zhang
Shaobo Yang
Peng Hao
Bin Fang
Yinghao Cai
Shaowei Cui
Shuo Wang
41
0
0
13 May 2025
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via
D
\mathbf{\texttt{D}}
D
ual-
H
\mathbf{\texttt{H}}
H
ead
O
\mathbf{\texttt{O}}
O
ptimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
58
0
0
12 May 2025
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Xilin Jiang
Junkai Wu
Vishal B. Choudhari
N. Mesgarani
VLM
35
0
0
11 May 2025
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Agnese Chiatti
Sara Bernardini
Lara Shibelski Godoy Piccolo
Viola Schiaffonati
Matteo Matteucci
67
0
0
08 May 2025
Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions
Hongyi Chen
Yunchao Yao
Yufei Ye
Zhixuan Xu
Homanga Bharadhwaj
Jiashun Wang
Shubham Tulsiani
Zackory Erickson
Jeffrey Ichnowski
40
0
0
07 May 2025
Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection
Fangling Jiang
Qi Li
Bing Liu
Weining Wang
Caifeng Shan
Zhenan Sun
Ming-Hsuan Yang
218
0
0
06 May 2025
Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Fangling Jiang
Qi Li
Weining Wang
Wei Shen
Bing Liu
Zhenan Sun
AAML
41
0
0
06 May 2025
DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving
Xinmeng Hou
Wuqi Wang
Long Yang
Hao Lin
Jinglun Feng
Haigen Min
Xiangmo Zhao
42
0
0
04 May 2025
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Yuchen Wang
X. Bai
Xiaochen Li
Weili Guan
Liqiang Nie
Xinyang Chen
VLM
49
0
0
04 May 2025
Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model
Muzammil Behzad
Guoying Zhao
VLM
51
0
0
28 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Zichen Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
99
0
0
28 Apr 2025
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
Junfei Wu
Hao Yang
Xinhua Zeng
Guibing He
Zhengzhang Chen
Zhu Li
Xinming Zhang
Yangyang Ma
Run Fang
Yang Liu
LRM
175
0
0
12 Apr 2025
COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails
Miguel Espinosa
V. Marsocci
Yuru Jia
Elliot J. Crowley
Mikolaj Czerkawski
DiffM
57
0
0
11 Apr 2025
M2IV: Towards Efficient and Fine-grained Multimodal In-Context Learning in Large Vision-Language Models
Yanshu Li
Hongyang He
Yi Cao
Qisen Cheng
Xiang Fu
Ruixiang Tang
VLM
47
0
0
06 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
75
0
0
01 Apr 2025
Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning
Ashim Dahal
Saydul Akbar Murad
Nick Rahimi
VLM
53
0
0
30 Mar 2025
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
Hao Guo
Jianfei Zhu
Wei Fan
Chunzhi Yi
Feng Jiang
ObjD
68
0
0
25 Mar 2025
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
...
Jinghua Yan
Y. Bai
P. Sadayappan
Xia Hu
Bo Yuan
VLM
64
0
0
24 Mar 2025
Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning
Zhe Hu
Jing Li
Yu Yin
Hou Pong Chan
Yu Yin
VLM
66
0
0
21 Mar 2025
NdLinear Is All You Need for Representation Learning
Alex Reneau
Jerry Yao-Chieh Hu
Zhongfang Zhuang
Ting-Chun Liu
HAI
44
0
0
21 Mar 2025
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models
Xiaojun Jia
Sensen Gao
Simeng Qin
Ke Ma
Xianrui Li
Yihao Huang
Wei Dong
Yang Liu
Xiaochun Cao
AAML
VLM
60
2
0
17 Mar 2025
Generalizable and Explainable Deep Learning for Medical Image Computing: An Overview
A. Chaddad
Yan Hu
Yihang Wu
Binbin Wen
R. Kateb
58
6
0
11 Mar 2025
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation
Yinuo Liu
Zenghui Yuan
Guiyao Tie
Jiawen Shi
Lichao Sun
Lichao Sun
Neil Zhenqiang Gong
48
1
0
08 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
212
0
0
05 Mar 2025
HVI: A New Color Space for Low-light Image Enhancement
Qingsen Yan
Yixu Feng
Cheng Zhang
Guansong Pang
Kangbiao Shi
Peng Wu
Wei Dong
Jinqiu Sun
Yanning Zhang
49
5
0
27 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
75
3
0
11 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
68
8
0
04 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAML
AuLLM
69
2
0
02 Feb 2025
Human Re-ID Meets LVLMs: What can we expect?
Kailash A. Hambarde
Pranita Samale
Hugo Proença
68
0
0
30 Jan 2025
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
Yanming Xiu
T. Scargill
M. Gorlatova
77
2
0
22 Jan 2025
OpenIN: Open-Vocabulary Instance-Oriented Navigation in Dynamic Domestic Environments
Yujie Tang
Ming Wang
Yinan Deng
Zibo Zheng
Jingchuan Deng
Yufeng Yue
LM&Ro
41
0
0
08 Jan 2025
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng
Qiguang Chen
Jin Zhang
Hao Fei
Xiaocheng Feng
Wanxiang Che
Min Li
L. Qin
VLM
MLLM
LRM
77
5
0
17 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
79
0
0
05 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
106
5
0
05 Dec 2024
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
SungHeon Jeong
Hanning Chen
Sanggeon Yun
Suhyeon Cho
Wenjun Huang
Xiangjian Liu
Mohsen Imani
103
2
0
04 Dec 2024
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
Xi Zhang
Zaiqiao Meng
Jake Lever
Edmond S. L. Ho
MedIm
101
1
0
28 Nov 2024
GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection
Jiyul Ham
Yonggon Jung
Jun-Geol Baek
VLM
48
1
0
09 Nov 2024
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
Shengxun Wei
Zan Gao
Yibo Zhao
Weili Guan
Weili Guan
Shengyong Chen
56
2
0
01 Nov 2024
GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing
Hosam Elgendy
Ahmed Sharshar
Ahmed Aboeitta
Yasser Ashraf
Mohsen Guizani
35
2
0
25 Oct 2024
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
Yongjian Wu
Yang Zhou
Jiya Saiyin
Bingzheng Wei
M. Lai
Jianzhong Shou
Yan Xu
VLM
MedIm
29
1
0
22 Oct 2024
GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs
Yun Zhu
Haizhou Shi
Xiaotang Wang
Yongchao Liu
Yaoke Wang
Boci Peng
Chuntao Hong
Siliang Tang
VLM
63
8
0
14 Oct 2024
Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting
Purushothaman Natarajan
Kamal Basha
Athira Nambiar
DiffM
32
0
0
11 Oct 2024
Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)
Abhijit Mishra
Shreya Shukla
Jose Torres
Jacek Gwizdka
Shounak Roychowdhury
53
4
0
10 Oct 2024
1
2
3
Next