Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.04478
Cited By
Prompting Visual-Language Models for Efficient Video Understanding
8 December 2021
Chen Ju
Tengda Han
Kunhao Zheng
Ya-Qin Zhang
Weidi Xie
VPVLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Prompting Visual-Language Models for Efficient Video Understanding"
50 / 74 papers shown
Title
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chen Wang
Fei Xia
Wenhao Yu
Tingnan Zhang
Ruohan Zhang
Ce Liu
Li Fei-Fei
Jie Tan
Jacky Liang
33
0
0
17 Apr 2025
Saliency-Motion Guided Trunk-Collateral Network for Unsupervised Video Object Segmentation
Xiangyu Zheng
Wanyun Li
Songcheng He
Jianping Fan
Xiaoqiang Li
We Zhang
VOS
35
0
0
08 Apr 2025
Post-processing for Fair Regression via Explainable SVD
Zhiqun Zuo
Ding Zhu
Mohammad Mahdi Khalili
157
0
0
04 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
41
0
0
02 Apr 2025
Prompt2LVideos: Exploring Prompts for Understanding Long-Form Multimodal Videos
Soumya Jahagirdar
Jayasree Saha
C. V. Jawahar
56
0
0
11 Mar 2025
Vision-aware Multimodal Prompt Tuning for Uploadable Multi-source Few-shot Domain Adaptation
Kuanghong Liu
Jin Wang
Kangjian He
Dan Xu
Xuejie Zhang
54
0
0
08 Mar 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
109
3
0
05 Jan 2025
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
54
0
0
18 Nov 2024
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
77
2
0
01 Nov 2024
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
Josiah Aklilu
Xiaohan Wang
Serena Yeung-Levy
57
1
0
18 Oct 2024
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning
William A. Stigall
62
0
0
14 Oct 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
135
7
0
02 Sep 2024
Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts
Peng Wu
Xuerong Zhou
Guansong Pang
Zhiwei Yang
Qingsen Yan
Peng Wang
Yanning Zhang
28
9
0
12 Aug 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
52
1
0
09 Jul 2024
CLIPVQA:Video Quality Assessment via CLIP
Fengchuang Xing
Mingjie Li
Yuan-Gen Wang
Guopu Zhu
Xiaochun Cao
CLIP
ViT
40
4
0
06 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
41
7
0
05 Jul 2024
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Akshita Gupta
Aditya Arora
Sanath Narayan
Salman Khan
F. Khan
Graham W. Taylor
38
3
0
21 Jun 2024
Context-Enhanced Video Moment Retrieval with Large Language Models
Weijia Liu
Bo Miao
Jiuxin Cao
Xueling Zhu
Bo Liu
Mehwish Nasim
Ajmal Saeed Mian
34
2
0
21 May 2024
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
Hongyu Qu
Rui Yan
Xiangbo Shu
Haoliang Gao
Peng Huang
Guo-Sen Xie
58
4
0
03 May 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
R
2
R^2
R
2
-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu
Jixuan He
Wanhua Li
Junsik Kim
D. Wei
Hanspeter Pfister
Chang Wen Chen
41
13
0
31 Mar 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
42
1
0
28 Mar 2024
PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization
Edward Fish
Jon Weinbren
Andrew Gilbert
44
1
0
27 Mar 2024
CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations
Yuwei Zhang
Yan Wu
Yanming Liu
Xinyue Peng
49
5
0
17 Mar 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya-Qin Zhang
Yanfeng Wang
39
10
0
17 Mar 2024
Multi-modal Stance Detection: New Datasets and Model
Bin Liang
Ang Li
Jingqian Zhao
Lin Gui
Min Yang
Yue Yu
Kam-Fai Wong
Ruifeng Xu
36
5
0
22 Feb 2024
Test-Time Personalization with Meta Prompt for Gaze Estimation
Huan Liu
Julia Qi
Zhenhao Li
Mohammad Hassanpour
Yang Wang
Konstantinos Plataniotis
Yuanhao Yu
34
4
0
03 Jan 2024
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
28
7
0
16 Oct 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
30
2
0
27 Sep 2023
AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation
Chaofan Ma
Yu-Hao Yang
Chen Ju
Fei Zhang
Ya-Qin Zhang
Yanfeng Wang
VLM
45
17
0
31 Aug 2023
UnLoc: A Unified Framework for Video Localization Tasks
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
31
53
0
21 Aug 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
37
7
0
27 Jul 2023
Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Kanchana Ranasinghe
Michael S. Ryoo
SSL
VLM
37
12
0
20 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
40
8
0
18 Jul 2023
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions
Dongshuo Yin
Xueting Han
Bin Li
Hao Feng
Jinghua Bai
VPVLM
26
17
0
16 Jun 2023
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection
Yunkang Cao
Xiaohao Xu
Chen Sun
Y. Cheng
Liang Gao
Nong Sang
32
1
0
15 Jun 2023
Segment Any Anomaly without Training via Hybrid Prompt Regularization
Yunkang Cao
Xiaohao Xu
Chen Sun
Y. Cheng
Zongwei Du
Liang Gao
Nong Sang
VLM
37
70
0
18 May 2023
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Joey Tianyi Zhou
45
129
0
11 May 2023
Efficient Multimodal Fusion via Interactive Prompting
Yaowei Li
Ruijie Quan
Linchao Zhu
Yezhou Yang
35
44
0
13 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
F. Khan
M. Shah
VLM
VPVLM
33
74
0
06 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
33
19
0
05 Apr 2023
Learning to Name Classes for Vision and Language Models
Sarah Parisot
Yongxin Yang
Steven G. McDonagh
VLM
17
10
0
04 Apr 2023
Probabilistic Prompt Learning for Dense Prediction
Hyeongjun Kwon
Taeyong Song
Somi Jeong
Jin-Hwa Kim
Jinhyun Jang
Kwanghoon Sohn
VLM
25
18
0
03 Apr 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
28
30
0
21 Mar 2023
Multi-modal Prompting for Low-Shot Temporal Action Localization
Chen Ju
Zeqian Li
Peisen Zhao
Ya-Qin Zhang
Xiaopeng Zhang
Qi Tian
Yanfeng Wang
Weidi Xie
39
18
0
21 Mar 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
Cheng Chen
Mu Li
ViT
58
144
0
06 Feb 2023
Affective Faces for Goal-Driven Dyadic Communication
Scott Geng
Revant Teotia
Purva Tendulkar
Sachit Menon
Carl Vondrick
VGen
26
18
0
26 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Yikang Shen
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
33
35
0
12 Jan 2023
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
32
46
0
09 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
F. Khan
CLIP
VLM
31
148
0
06 Dec 2022
1
2
Next