ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.00685
  4. Cited By
Vision-Language Models for Vision Tasks: A Survey

Vision-Language Models for Vision Tasks: A Survey

3 April 2023
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
    VLM
ArXivPDFHTML

Papers citing "Vision-Language Models for Vision Tasks: A Survey"

50 / 115 papers shown
Title
IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models
IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models
Tuo An
Yunjiao Zhou
Han Zou
Jianfei Yang
LRM
34
4
0
03 Oct 2024
Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker
Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker
Xinlong Hou
Sen Shen
Xueshen Li
Xinran Gao
Ziyi Huang
Steven J. Holiday
Matthew R. Cribbet
Susan W. White
Edward Sazonov
Yu Gan
36
0
0
02 Oct 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGe
VLM
69
2
0
19 Sep 2024
Bootstrapping Object-level Planning with Large Language Models
Bootstrapping Object-level Planning with Large Language Models
D. Paulius
Alejandro Agostini
Benedict Quartey
George Konidaris
LM&Ro
43
1
0
18 Sep 2024
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection
Yaning Zhang
Tianyi Wang
Zitong Yu
Zan Gao
Linlin Shen
Shengyong Chen
DiffM
76
3
0
15 Sep 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
46
0
0
14 Sep 2024
Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology
Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology
Pei Liu
Luping Ji
Jiaxiang Gou
Bo Fu
Mao Ye
41
2
0
14 Sep 2024
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Peiming Guo
Sinuo Liu
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
Hao Fei
DiffM
50
1
0
16 Aug 2024
Target Prompting for Information Extraction with Vision Language Model
Target Prompting for Information Extraction with Vision Language Model
Dipankar Medhi
VLM
40
0
0
07 Aug 2024
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via
  VLM
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM
Keshav Bimbraw
Ye Wang
Jing Liu
T. Koike-Akino
VLM
MedIm
LM&MA
42
1
0
15 Jul 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in
  the Era of Large Language Models
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Jinliang Lu
Ziliang Pang
Min Xiao
Yaochen Zhu
Rui Xia
Jiajun Zhang
MoMe
59
18
0
08 Jul 2024
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
Hao Liang
Jiapeng Li
Tianyi Bai
Xijie Huang
Linzhuang Sun
Zhengren Wang
Conghui He
Bin Cui
Chong Chen
Wentao Zhang
VGen
34
7
0
03 Jul 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLM
ObjD
86
1
0
21 Jun 2024
MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning
MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning
Shuo Xu
Sai Wang
Xinyue Hu
Yutian Lin
Bo Du
Yu Wu
CoGe
59
1
0
18 Jun 2024
Industrial Language-Image Dataset (ILID): Adapting Vision Foundation
  Models for Industrial Settings
Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings
Keno Moenck
Duc Trung Thieu
Julian Koch
Thorsten Schuppstuhl
VLM
31
0
0
14 Jun 2024
Benchmarking Vision-Language Contrastive Methods for Medical
  Representation Learning
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy
Yasaman Parhizkar
Franklin Ogidi
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
VLM
54
1
0
11 Jun 2024
Language-guided Detection and Mitigation of Unknown Dataset Bias
Language-guided Detection and Mitigation of Unknown Dataset Bias
Zaiying Zhao
Soichiro Kumano
Toshihiko Yamasaki
53
2
0
05 Jun 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
45
0
23 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
46
30
0
18 May 2024
Safeguarding Vision-Language Models Against Patched Visual Prompt
  Injectors
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
Jiachen Sun
Changsheng Wang
Jiong Wang
Yiwei Zhang
Chaowei Xiao
AAML
VLM
39
3
0
17 May 2024
Contextual Emotion Recognition using Large Vision Language Models
Contextual Emotion Recognition using Large Vision Language Models
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
VLM
73
3
0
14 May 2024
Realizing Visual Question Answering for Education: GPT-4V as a
  Multimodal AI
Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI
Gyeong-Geon Lee
Xiaoming Zhai
43
5
0
12 May 2024
Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA
  Benchmark
Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark
Evan M. Williams
Kathleen M. Carley
CoGe
44
0
0
10 May 2024
On the test-time zero-shot generalization of vision-language models: Do
  we really need prompt learning?
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
Maxime Zanella
Ismail Ben Ayed
VLM
MLLM
56
23
0
03 May 2024
EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
Deng Li
Xin Liu
Bohao Xing
Baiqiang Xia
Yuan Zong
Bihan Wen
Heikki Kälviäinen
42
3
0
01 May 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
84
3
0
29 Apr 2024
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Yu Xia
Rui Wang
Xu Liu
Mingyan Li
Tong Yu
Xiang Chen
Julian McAuley
Shuai Li
LRM
59
19
0
24 Apr 2024
Privacy Preserving Prompt Engineering: A Survey
Privacy Preserving Prompt Engineering: A Survey
Kennedy Edemacu
Xintao Wu
63
18
0
09 Apr 2024
ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models
ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models
Vishnunandan L. N. Venkatesh
Byung-Cheol Min
LM&Ro
79
2
0
02 Apr 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
51
19
0
30 Mar 2024
Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making
Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making
Shuai Ma
Qiaoyi Chen
Xinru Wang
Chengbo Zheng
Zhenhui Peng
Ming Yin
Xiaojuan Ma
ELM
42
20
0
25 Mar 2024
To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions
To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions
Daniel Tanneberg
Felix Ocker
Stephan Hasler
Joerg Deigmoeller
Anna Belardinelli
Chao Wang
H. Wersing
Bernhard Sendhoff
Michael Gienger
LM&Ro
61
14
0
19 Mar 2024
Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained
  Ship Classification
Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification
Long Lan
Fengxiang Wang
Shuyan Li
Xiangtao Zheng
Zengmao Wang
Xinwang Liu
VLM
31
8
0
13 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
52
8
0
29 Feb 2024
Domain Adaptation for Large-Vocabulary Object Detectors
Domain Adaptation for Large-Vocabulary Object Detectors
Kai Jiang
Jiaxing Huang
Weiying Xie
Jie Lei
Yunsong Li
Ling Shao
Shijian Lu
ObjD
VLM
42
2
0
13 Jan 2024
CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization
  in Healthcare
CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare
Akash Ghosh
Arkadeep Acharya
Raghav Jain
Sriparna Saha
Aman Chadha
Setu Sinha
35
29
0
16 Dec 2023
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary
  Instance Segmentation
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
Jiahao Xie
Wei Li
Xiangtai Li
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
DiffM
VLM
72
35
0
22 Sep 2023
Few-shot medical image classification with simple shape and texture text
  descriptors using vision-language models
Few-shot medical image classification with simple shape and texture text descriptors using vision-language models
Michal Byra
M. F. Rachmadi
Henrik Skibbe
VLM
43
6
0
08 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
43
119
0
25 Jul 2023
A Survey of Label-Efficient Deep Learning for 3D Point Clouds
A Survey of Label-Efficient Deep Learning for 3D Point Clouds
Aoran Xiao
Xiaoqin Zhang
Ling Shao
Shijian Lu
3DPC
43
18
0
31 May 2023
The Rise of AI Language Pathologists: Exploring Two-level Prompt
  Learning for Few-shot Weakly-supervised Whole Slide Image Classification
The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification
Linhao Qu
X. Luo
Kexue Fu
Manning Wang
Zhijian Song
46
22
0
29 May 2023
Differentially Private Attention Computation
Differentially Private Attention Computation
Yeqi Gao
Zhao Song
Xin Yang
55
21
0
08 May 2023
The Potential of Visual ChatGPT For Remote Sensing
The Potential of Visual ChatGPT For Remote Sensing
L. Osco
Eduardo Lopes de Lemos
W. Gonçalves
A. P. Ramos
J. M. Junior
25
30
0
25 Apr 2023
Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
Hantao Yao
Rui Zhang
Changsheng Xu
VLM
VPVLM
130
204
0
23 Mar 2023
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Luting Wang
Yi Liu
Penghui Du
Zihan Ding
Yue Liao
Qiaosong Qi
Biaolong Chen
Si Liu
ObjD
VLM
73
62
0
10 Mar 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware
  Attention
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIP
VLM
49
44
0
06 Mar 2023
MaPLe: Multi-modal Prompt Learning
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
VPVLM
VLM
212
538
0
06 Oct 2022
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Ziyu Guo
Renrui Zhang
Longtian Qiu
Xianzheng Ma
Xupeng Miao
Xuming He
Bin Cui
VLM
AAML
66
110
0
28 Sep 2022
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
Janghyeon Lee
Jongsuk Kim
Hyounguk Shon
Bumsoo Kim
Seung Wook Kim
Honglak Lee
Junmo Kim
CLIP
VLM
54
54
0
27 Sep 2022
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
  Open-world Detection
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Lewei Yao
Jianhua Han
Youpeng Wen
Xiaodan Liang
Dan Xu
Wei Zhang
Zhenguo Li
Chunjing Xu
Hang Xu
CLIP
VLM
115
153
0
20 Sep 2022
Previous
123
Next