Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.03307
Cited By
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
6 April 2023
Syed Talal Wasim
Muzammal Naseer
Salman Khan
F. Khan
M. Shah
VLM
VPVLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting"
50 / 65 papers shown
Title
AnimalMotionCLIP: Embedding motion in CLIP for Animal Behavior Analysis
Enmin Zhong
Carlos R. del-Blanco
Daniel Berjón
F. Jaureguizar
Narciso N. García
34
0
0
30 Apr 2025
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan
A. Ahmed
Mohamad Alansari
Neha Gour
Abderaouf Behouch
...
Muzammal Naseer
Juergen Gall
Mohammed Bennamoun
Ernesto Damiani
Naoufel Werghi
47
0
0
03 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
41
0
0
02 Apr 2025
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Thinesh Thiyakesan Ponbagavathi
Alina Roitberg
39
0
0
31 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Pengfei Zhu
Q. Hu
CLIP
48
0
0
24 Mar 2025
Prompt2LVideos: Exploring Prompts for Understanding Long-Form Multimodal Videos
Soumya Jahagirdar
Jayasree Saha
C. V. Jawahar
56
0
0
11 Mar 2025
Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity
Xiaohao Xu
Feng Xue
Xianrui Li
Haowei Li
S. M. I. Simon X. Yang
T. Zhang
Matthew Johnson-Roberson
Xiaonan Huang
3DV
43
0
0
08 Mar 2025
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu
Congqi Cao
Yifan Zhang
Yanning Zhang
VLM
43
0
0
27 Feb 2025
Parameter-Efficient Fine-Tuning for Foundation Models
Dan Zhang
Tao Feng
Lilong Xue
Yuandong Wang
Yuxiao Dong
J. Tang
46
8
0
23 Jan 2025
AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot Semantic Segmentation
Jiaqi Ma
Guo-Sen Xie
Fang Zhao
Zechao Li
32
0
0
23 Dec 2024
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
54
0
0
18 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Wentao Bao
K. Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
42
2
0
17 Nov 2024
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping
Taolin Zhang
J. T. Wang
Hang Guo
Tao Dai
Bin Chen
Shu-Tao Xia
VLM
TTA
19
0
0
20 Oct 2024
Storyboard guided Alignment for Fine-grained Video Action Recognition
Enqi Liu
Liyuan Pan
Yan Yang
Yiran Zhong
Zhijing Wu
Xinxiao Wu
Liu Liu
33
0
0
18 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
37
1
0
14 Oct 2024
Continual Learning Improves Zero-Shot Action Recognition
Shreyank N. Gowda
Davide Moltisanti
Laura Sevilla-Lara
BDL
VLM
CLL
27
1
0
14 Oct 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViT
VLM
34
4
0
06 Sep 2024
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Wei-Jhe Huang
Min-Hung Chen
Shang-Hong Lai
27
0
0
28 Aug 2024
Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition
Bozheng Li
Mushui Liu
Gaoang Wang
Yunlong Yu
26
5
0
22 Aug 2024
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
Mushui Liu
Bozheng Li
Yunlong Yu
VLM
23
9
0
12 Aug 2024
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition
Congqi Cao
Guibiao Liao
Yating Yu
Kanglin Liu
Lingtong Min
Yanning Zhang
32
4
0
01 Aug 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Tz-Ying Wu
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
EgoV
55
0
0
28 Jul 2024
Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification
Linhao Qu
Dingkang Yang
Dan Huang
Qinhao Guo
Rongkui Luo
Shaoting Zhang
Xiaosong Wang
VLM
61
6
0
15 Jul 2024
Open Vocabulary Multi-Label Video Classification
Rohit Gupta
Mamshad Nayeem Rizve
Jayakrishnan Unnikrishnan
Ashish Tawari
Son Tran
Mubarak Shah
Benjamin Z. Yao
Trishul M. Chilimbi
VLM
67
1
0
12 Jul 2024
15M Multimodal Facial Image-Text Dataset
Dawei Dai
Yutang Li
Yingge Liu
Mingming Jia
Zhang YuanHui
Guoyin Wang
VLM
31
7
0
11 Jul 2024
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
Shuang Hao
Chunlin Zhong
He Tang
31
1
0
09 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
39
7
0
05 Jul 2024
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
Lin Zhu
Yifeng Yang
Qinying Gu
Xinbing Wang
Cheng Zhou
Nanyang Ye
VLM
34
2
0
26 May 2024
Enhanced Multimodal Content Moderation of Children's Videos using Audiovisual Fusion
Syed Hammad Ahmed
M. Khan
G. Sukthankar
29
1
0
09 May 2024
Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification
Siqi Yin
Lifan Jiang
24
0
0
03 May 2024
VG4D: Vision-Language Model Goes 4D Video Recognition
Zhichao Deng
Xiangtai Li
Xia Li
Yunhai Tong
Shen Zhao
Mengyuan Liu
3DPC
34
6
0
17 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
48
2
0
15 Apr 2024
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers
Lakshmi Nair
VLM
29
0
0
09 Apr 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
42
1
0
28 Mar 2024
Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model
Diwei Wang
Kun Yuan
Candice Müller
Frédéric Blanc
N. Padoy
Hyewon Seo
46
2
0
20 Mar 2024
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
Kun-Yu Lin
Henghui Ding
Jiaming Zhou
Yu-Ming Tang
Yi-Xing Peng
Zhilin Zhao
Chen Change Loy
Wei-Shi Zheng
VLM
37
15
0
03 Mar 2024
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
Xiaohui Huang
Hao Zhou
Kun Yao
Kai Han
VLM
54
19
0
05 Feb 2024
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Boyuan Jiang
Jun Chen
Jianbiao Mei
Xingxing Zuo
Guang Dai
Jingdong Wang
Yong-Jin Liu
VLM
28
3
0
22 Jan 2024
SYNC-CLIP: Synthetic Data Make CLIP Generalize Better in Data-Limited Scenarios
Mushui Liu
Weijie He
Ziqian Lu
Yunlong Yu
VLM
24
1
0
06 Dec 2023
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
15
7
0
30 Nov 2023
Hardware Resilience Properties of Text-Guided Image Classifiers
Syed Talal Wasim
Kabila Haile Soboka
Abdulrahman Mahmoud
Salman Khan
David Brooks
Gu-Yeon Wei
VLM
22
1
0
23 Nov 2023
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
Jieming Cui
Ziren Gong
Baoxiong Jia
Siyuan Huang
Zilong Zheng
Jianzhu Ma
Yixin Zhu
34
3
0
01 Nov 2023
Videoprompter: an ensemble of foundational models for zero-shot video understanding
Adeel Yousaf
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Mubarak Shah
VLM
35
2
0
23 Oct 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
30
2
0
27 Sep 2023
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Xin Li
Dongze Lian
Zhihe Lu
Jiawang Bai
Zhibo Chen
Xinchao Wang
VLM
43
60
0
24 Sep 2023
Delving into Multimodal Prompting for Fine-grained Visual Classification
Xin Jiang
Hao Tang
Junyao Gao
Xiaoyu Du
Shengfeng He
Zechao Li
VLM
21
22
0
16 Sep 2023
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
36
16
0
22 Aug 2023
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning
Qianqian Wang
Junlong Du
Ke Yan
Shouhong Ding
VLM
38
17
0
09 Aug 2023
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
Jiazhou Zhou
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
24
13
0
06 Aug 2023
1
2
Next