ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.03307
  4. Cited By
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

6 April 2023
Syed Talal Wasim
Muzammal Naseer
Salman Khan
F. Khan
M. Shah
    VLM
    VPVLM
ArXivPDFHTML

Papers citing "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting"

50 / 65 papers shown
Title
AnimalMotionCLIP: Embedding motion in CLIP for Animal Behavior Analysis
AnimalMotionCLIP: Embedding motion in CLIP for Animal Behavior Analysis
Enmin Zhong
Carlos R. del-Blanco
Daniel Berjón
F. Jaureguizar
Narciso N. García
34
0
0
30 Apr 2025
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan
A. Ahmed
Mohamad Alansari
Neha Gour
Abderaouf Behouch
...
Muzammal Naseer
Juergen Gall
Mohammed Bennamoun
Ernesto Damiani
Naoufel Werghi
47
0
0
03 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
41
0
0
02 Apr 2025
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Thinesh Thiyakesan Ponbagavathi
Alina Roitberg
39
0
0
31 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Pengfei Zhu
Q. Hu
CLIP
48
0
0
24 Mar 2025
Prompt2LVideos: Exploring Prompts for Understanding Long-Form Multimodal Videos
Soumya Jahagirdar
Jayasree Saha
C. V. Jawahar
56
0
0
11 Mar 2025
Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity
Xiaohao Xu
Feng Xue
Xianrui Li
Haowei Li
S. M. I. Simon X. Yang
T. Zhang
Matthew Johnson-Roberson
Xiaonan Huang
3DV
43
0
0
08 Mar 2025
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu
Congqi Cao
Yifan Zhang
Yanning Zhang
VLM
43
0
0
27 Feb 2025
Parameter-Efficient Fine-Tuning for Foundation Models
Parameter-Efficient Fine-Tuning for Foundation Models
Dan Zhang
Tao Feng
Lilong Xue
Yuandong Wang
Yuxiao Dong
J. Tang
46
8
0
23 Jan 2025
AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot
  Semantic Segmentation
AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot Semantic Segmentation
Jiaqi Ma
Guo-Sen Xie
Fang Zhao
Zechao Li
32
0
0
23 Dec 2024
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
54
0
0
18 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Wentao Bao
K. Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
42
2
0
17 Nov 2024
BoostAdapter: Improving Vision-Language Test-Time Adaptation via
  Regional Bootstrapping
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping
Taolin Zhang
J. T. Wang
Hang Guo
Tao Dai
Bin Chen
Shu-Tao Xia
VLM
TTA
19
0
0
20 Oct 2024
Storyboard guided Alignment for Fine-grained Video Action Recognition
Storyboard guided Alignment for Fine-grained Video Action Recognition
Enqi Liu
Liyuan Pan
Yan Yang
Yiran Zhong
Zhijing Wu
Xinxiao Wu
Liu Liu
33
0
0
18 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language
  to Video Knowledge Transfer
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
37
1
0
14 Oct 2024
Continual Learning Improves Zero-Shot Action Recognition
Continual Learning Improves Zero-Shot Action Recognition
Shreyank N. Gowda
Davide Moltisanti
Laura Sevilla-Lara
BDL
VLM
CLL
27
1
0
14 Oct 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViT
VLM
34
4
0
06 Sep 2024
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Wei-Jhe Huang
Min-Hung Chen
Shang-Hong Lai
27
0
0
28 Aug 2024
Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action
  Recognition
Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition
Bozheng Li
Mushui Liu
Gaoang Wang
Yunlong Yu
26
5
0
22 Aug 2024
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal
  Omni-Scale Feature Learning
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
Mushui Liu
Bozheng Li
Yunlong Yu
VLM
23
9
0
12 Aug 2024
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot
  Action Recognition
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition
Congqi Cao
Guibiao Liao
Yating Yu
Kanglin Liu
Lingtong Min
Yanning Zhang
32
4
0
01 Aug 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Tz-Ying Wu
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
EgoV
55
0
0
28 Jul 2024
Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot
  Whole Slide Image Classification
Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification
Linhao Qu
Dingkang Yang
Dan Huang
Qinhao Guo
Rongkui Luo
Shaoting Zhang
Xiaosong Wang
VLM
61
6
0
15 Jul 2024
Open Vocabulary Multi-Label Video Classification
Open Vocabulary Multi-Label Video Classification
Rohit Gupta
Mamshad Nayeem Rizve
Jayakrishnan Unnikrishnan
Ashish Tawari
Son Tran
Mubarak Shah
Benjamin Z. Yao
Trishul M. Chilimbi
VLM
67
1
0
12 Jul 2024
15M Multimodal Facial Image-Text Dataset
15M Multimodal Facial Image-Text Dataset
Dawei Dai
Yutang Li
Yingge Liu
Mingming Jia
Zhang YuanHui
Guoyin Wang
VLM
31
7
0
11 Jul 2024
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient
  Object Detection
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
Shuang Hao
Chunlin Zhong
He Tang
31
1
0
09 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting,
  and Transportation
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
39
7
0
05 Jul 2024
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD
  Generalization and Open-Set OOD Detection
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
Lin Zhu
Yifeng Yang
Qinying Gu
Xinbing Wang
Cheng Zhou
Nanyang Ye
VLM
34
2
0
26 May 2024
Enhanced Multimodal Content Moderation of Children's Videos using
  Audiovisual Fusion
Enhanced Multimodal Content Moderation of Children's Videos using Audiovisual Fusion
Syed Hammad Ahmed
M. Khan
G. Sukthankar
29
1
0
09 May 2024
Multi-method Integration with Confidence-based Weighting for Zero-shot
  Image Classification
Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification
Siqi Yin
Lifan Jiang
24
0
0
03 May 2024
VG4D: Vision-Language Model Goes 4D Video Recognition
VG4D: Vision-Language Model Goes 4D Video Recognition
Zhichao Deng
Xiangtai Li
Xia Li
Yunhai Tong
Shen Zhao
Mengyuan Liu
3DPC
34
6
0
17 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
48
2
0
15 Apr 2024
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using
  Embeddings as Teachers
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers
Lakshmi Nair
VLM
29
0
0
09 Apr 2024
Koala: Key frame-conditioned long video-LLM
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action
  Generalization
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
42
1
0
28 Mar 2024
Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge
  Augmentation in Vision Language Model
Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model
Diwei Wang
Kun Yuan
Candice Müller
Frédéric Blanc
N. Padoy
Hyewon Seo
46
2
0
20 Mar 2024
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary
  Action Recognition
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
Kun-Yu Lin
Henghui Ding
Jiaming Zhou
Yu-Ming Tang
Yi-Xing Peng
Zhilin Zhao
Chen Change Loy
Wei-Shi Zheng
VLM
37
15
0
03 Mar 2024
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action
  Recognition
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
Xiaohui Huang
Hao Zhou
Kun Yao
Kai Han
VLM
54
19
0
05 Feb 2024
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action
  Recognition
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Boyuan Jiang
Jun Chen
Jianbiao Mei
Xingxing Zuo
Guang Dai
Jingdong Wang
Yong-Jin Liu
VLM
28
3
0
22 Jan 2024
SYNC-CLIP: Synthetic Data Make CLIP Generalize Better in Data-Limited
  Scenarios
SYNC-CLIP: Synthetic Data Make CLIP Generalize Better in Data-Limited Scenarios
Mushui Liu
Weijie He
Ziqian Lu
Yunlong Yu
VLM
24
1
0
06 Dec 2023
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for
  General Video Recognition
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
15
7
0
30 Nov 2023
Hardware Resilience Properties of Text-Guided Image Classifiers
Hardware Resilience Properties of Text-Guided Image Classifiers
Syed Talal Wasim
Kabila Haile Soboka
Abdulrahman Mahmoud
Salman Khan
David Brooks
Gu-Yeon Wei
VLM
22
1
0
23 Nov 2023
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
Jieming Cui
Ziren Gong
Baoxiong Jia
Siyuan Huang
Zilong Zheng
Jianzhu Ma
Yixin Zhu
34
3
0
01 Nov 2023
Videoprompter: an ensemble of foundational models for zero-shot video
  understanding
Videoprompter: an ensemble of foundational models for zero-shot video understanding
Adeel Yousaf
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Mubarak Shah
VLM
35
2
0
23 Oct 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
30
2
0
27 Sep 2023
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Xin Li
Dongze Lian
Zhihe Lu
Jiawang Bai
Zhibo Chen
Xinchao Wang
VLM
43
60
0
24 Sep 2023
Delving into Multimodal Prompting for Fine-grained Visual Classification
Delving into Multimodal Prompting for Fine-grained Visual Classification
Xin Jiang
Hao Tang
Junyao Gao
Xiaoyu Du
Shengfeng He
Zechao Li
VLM
21
22
0
16 Sep 2023
Opening the Vocabulary of Egocentric Actions
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
36
16
0
22 Aug 2023
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion
  Prompts Learning
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning
Qianqian Wang
Junlong Du
Ke Yan
Shouhong Ding
VLM
38
17
0
09 Aug 2023
EventBind: Learning a Unified Representation to Bind Them All for
  Event-based Open-world Understanding
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
Jiazhou Zhou
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
24
13
0
06 Aug 2023
12
Next