Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.07646
Cited By
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
15 July 2022
Rui Qian
Yeqing Li
Zheng Xu
Ming Yang
Serge Belongie
Huayu Chen
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models"
16 / 16 papers shown
Title
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
52
8
0
29 Feb 2024
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
Yuhang Zang
Hanlin Goh
Josh Susskind
Chen Huang
VLM
39
12
0
29 Jan 2024
Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Kanchana Ranasinghe
Michael S. Ryoo
SSL
VLM
40
12
0
20 Jul 2023
MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation
Jie Guo
Qimeng Wang
Yan Gao
Xiaolong Jiang
Xu Tang
Yao Hu
Baochang Zhang
VLM
37
11
0
14 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
36
20
0
05 Apr 2023
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
32
46
0
09 Dec 2022
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
James Smith
Paola Cascante-Bonilla
Assaf Arbelle
Donghyun Kim
Yikang Shen
David D. Cox
Diyi Yang
Z. Kira
Rogerio Feris
Leonid Karlinsky
VLM
44
20
0
17 Nov 2022
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
152
362
0
17 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
348
2,271
0
02 Sep 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
225
899
0
28 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
322
3,708
0
11 Feb 2021
Self-supervised Co-training for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
215
309
0
19 Oct 2020
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Biagio Brattoli
Joseph Tighe
Fedor Zhdanov
Pietro Perona
Krzysztof Chalupka
VLM
137
127
0
03 Mar 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
221
1,399
0
04 Dec 2018
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
281
31,267
0
16 Jan 2013
1