ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.00685
  4. Cited By
Vision-Language Models for Vision Tasks: A Survey

Vision-Language Models for Vision Tasks: A Survey

3 April 2023
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
    VLM
ArXivPDFHTML

Papers citing "Vision-Language Models for Vision Tasks: A Survey"

50 / 115 papers shown
Title
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
Zeqing Wang
Shiyuan Zhang
Chengpei Tang
Keze Wang
LRM
14
0
0
21 May 2025
Uncovering Cultural Representation Disparities in Vision-Language Models
Uncovering Cultural Representation Disparities in Vision-Language Models
Ram Mohan Rao Kadiyala
Siddhant Gupta
Jebish Purbey
Srishti Yadav
Alejandro Salamanca
Desmond Elliott
7
0
0
20 May 2025
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Manyu Li
Ruian He
Zixian Zhang
Weimin Tan
Bo Yan
VLM
13
0
0
16 May 2025
Vision language models have difficulty recognizing virtual objects
Vision language models have difficulty recognizing virtual objects
Tyler Tran
Sangeet Khemlani
J. Gregory Trafton
19
0
0
15 May 2025
Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
Muzammil Behzad
VLM
28
0
0
14 May 2025
Open the Eyes of MPNN: Vision Enhances MPNN in Link Prediction
Open the Eyes of MPNN: Vision Enhances MPNN in Link Prediction
Yanbin Wei
Xuehao Wang
Zhan Zhuang
Yang Chen
Shuhao Chen
Yulong Zhang
Yu-Jie Zhang
James T. Kwok
39
1
0
13 May 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
50
0
0
13 May 2025
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
Wenxuan Ma
Xiaoge Cao
Yujie Zhang
Chaofan Zhang
Shaobo Yang
Peng Hao
Bin Fang
Yinghao Cai
Shaowei Cui
Shuo Wang
41
0
0
13 May 2025
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via D\mathbf{\texttt{D}}Dual-H\mathbf{\texttt{H}}Head O\mathbf{\texttt{O}}Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
58
0
0
12 May 2025
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Xilin Jiang
Junkai Wu
Vishal B. Choudhari
N. Mesgarani
VLM
35
0
0
11 May 2025
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Agnese Chiatti
Sara Bernardini
Lara Shibelski Godoy Piccolo
Viola Schiaffonati
Matteo Matteucci
67
0
0
08 May 2025
Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions
Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions
Hongyi Chen
Yunchao Yao
Yufei Ye
Zhixuan Xu
Homanga Bharadhwaj
Jiashun Wang
Shubham Tulsiani
Zackory Erickson
Jeffrey Ichnowski
40
0
0
07 May 2025
Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection
Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection
Fangling Jiang
Qi Li
Bing Liu
Weining Wang
Caifeng Shan
Zhenan Sun
Ming-Hsuan Yang
218
0
0
06 May 2025
Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
Fangling Jiang
Qi Li
Weining Wang
Wei Shen
Bing Liu
Zhenan Sun
AAML
41
0
0
06 May 2025
DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving
DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving
Xinmeng Hou
Wuqi Wang
Long Yang
Hao Lin
Jinglun Feng
Haigen Min
Xiangmo Zhao
42
0
0
04 May 2025
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Yuchen Wang
X. Bai
Xiaochen Li
Weili Guan
Liqiang Nie
Xinyang Chen
VLM
49
0
0
04 May 2025
Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model
Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model
Muzammil Behzad
Guoying Zhao
VLM
51
0
0
28 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Zichen Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
99
0
0
28 Apr 2025
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
Junfei Wu
Hao Yang
Xinhua Zeng
Guibing He
Zhengzhang Chen
Zhu Li
Xinming Zhang
Yangyang Ma
Run Fang
Yang Liu
LRM
175
0
0
12 Apr 2025
COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails
COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails
Miguel Espinosa
V. Marsocci
Yuru Jia
Elliot J. Crowley
Mikolaj Czerkawski
DiffM
57
0
0
11 Apr 2025
M2IV: Towards Efficient and Fine-grained Multimodal In-Context Learning in Large Vision-Language Models
M2IV: Towards Efficient and Fine-grained Multimodal In-Context Learning in Large Vision-Language Models
Yanshu Li
Hongyang He
Yi Cao
Qisen Cheng
Xiang Fu
Ruixiang Tang
VLM
47
0
0
06 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
75
0
0
01 Apr 2025
Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning
Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning
Ashim Dahal
Saydul Akbar Murad
Nick Rahimi
VLM
53
0
0
30 Mar 2025
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding
Hao Guo
Jianfei Zhu
Wei Fan
Chunzhi Yi
Feng Jiang
ObjD
68
0
0
25 Mar 2025
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
...
Jinghua Yan
Y. Bai
P. Sadayappan
Xia Hu
Bo Yuan
VLM
64
0
0
24 Mar 2025
Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning
Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning
Zhe Hu
Jing Li
Yu Yin
Hou Pong Chan
Yu Yin
VLM
66
0
0
21 Mar 2025
NdLinear Is All You Need for Representation Learning
NdLinear Is All You Need for Representation Learning
Alex Reneau
Jerry Yao-Chieh Hu
Zhongfang Zhuang
Ting-Chun Liu
HAI
44
0
0
21 Mar 2025
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models
Xiaojun Jia
Sensen Gao
Simeng Qin
Ke Ma
Xianrui Li
Yihao Huang
Wei Dong
Yang Liu
Xiaochun Cao
AAML
VLM
60
2
0
17 Mar 2025
Generalizable and Explainable Deep Learning for Medical Image Computing: An Overview
A. Chaddad
Yan Hu
Yihang Wu
Binbin Wen
R. Kateb
58
6
0
11 Mar 2025
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation
Yinuo Liu
Zenghui Yuan
Guiyao Tie
Jiawen Shi
Lichao Sun
Lichao Sun
Neil Zhenqiang Gong
48
1
0
08 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
212
0
0
05 Mar 2025
HVI: A New Color Space for Low-light Image Enhancement
HVI: A New Color Space for Low-light Image Enhancement
Qingsen Yan
Yixu Feng
Cheng Zhang
Guansong Pang
Kangbiao Shi
Peng Wu
Wei Dong
Jinqiu Sun
Yanning Zhang
49
5
0
27 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
75
3
0
11 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
68
8
0
04 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAML
AuLLM
69
2
0
02 Feb 2025
Human Re-ID Meets LVLMs: What can we expect?
Human Re-ID Meets LVLMs: What can we expect?
Kailash A. Hambarde
Pranita Samale
Hugo Proença
68
0
0
30 Jan 2025
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
Yanming Xiu
T. Scargill
M. Gorlatova
77
2
0
22 Jan 2025
OpenIN: Open-Vocabulary Instance-Oriented Navigation in Dynamic Domestic Environments
OpenIN: Open-Vocabulary Instance-Oriented Navigation in Dynamic Domestic Environments
Yujie Tang
Ming Wang
Yinan Deng
Zibo Zheng
Jingchuan Deng
Yufeng Yue
LM&Ro
41
0
0
08 Jan 2025
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng
Qiguang Chen
Jin Zhang
Hao Fei
Xiaocheng Feng
Wanxiang Che
Min Li
L. Qin
VLM
MLLM
LRM
77
5
0
17 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
79
0
0
05 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
106
5
0
05 Dec 2024
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
SungHeon Jeong
Hanning Chen
Sanggeon Yun
Suhyeon Cho
Wenjun Huang
Xiangjian Liu
Mohsen Imani
103
2
0
04 Dec 2024
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
Xi Zhang
Zaiqiao Meng
Jake Lever
Edmond S. L. Ho
MedIm
101
1
0
28 Nov 2024
GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot
  Anomaly Detection
GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection
Jiyul Ham
Yonggon Jung
Jun-Geol Baek
VLM
48
1
0
09 Nov 2024
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
Shengxun Wei
Zan Gao
Yibo Zhao
Weili Guan
Weili Guan
Shengyong Chen
56
2
0
01 Nov 2024
GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing
GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing
Hosam Elgendy
Ahmed Sharshar
Ahmed Aboeitta
Yasser Ashraf
Mohsen Guizani
35
2
0
25 Oct 2024
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot
  Nuclei Detection via Visual-Language Pre-trained Models
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
Yongjian Wu
Yang Zhou
Jiya Saiyin
Bingzheng Wei
M. Lai
Jianzhong Shou
Yan Xu
VLM
MedIm
29
1
0
22 Oct 2024
GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs
GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs
Yun Zhu
Haizhou Shi
Xiaotang Wang
Yongchao Liu
Yaoke Wang
Boci Peng
Chuntao Hong
Siliang Tang
VLM
63
8
0
14 Oct 2024
Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism
  via Dual Diffusion Models and GPT Prompting
Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting
Purushothaman Natarajan
Kamal Basha
Athira Nambiar
DiffM
32
0
0
11 Oct 2024
Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)
Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)
Abhijit Mishra
Shreya Shukla
Jose Torres
Jacek Gwizdka
Shounak Roychowdhury
53
4
0
10 Oct 2024
123
Next