ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.02816
  4. Cited By
Expanding Language-Image Pretrained Models for General Video Recognition

Expanding Language-Image Pretrained Models for General Video Recognition

4 August 2022
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
    VLM
    CLIP
    ViT
ArXivPDFHTML

Papers citing "Expanding Language-Image Pretrained Models for General Video Recognition"

50 / 225 papers shown
Title
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive
  Reasoning through Theory of Mind
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao
Xin Lin
Qin Ni
Liang He
29
3
0
12 Feb 2024
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action
  Recognition
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
Xiaohui Huang
Hao Zhou
Kun Yao
Kai Han
VLM
57
19
0
05 Feb 2024
Taylor Videos for Action Recognition
Taylor Videos for Action Recognition
Lei Wang
Xiuyuan Yuan
Tom Gedeon
Liang Zheng
26
6
0
05 Feb 2024
Spatio-temporal Prompting Network for Robust Video Feature Extraction
Spatio-temporal Prompting Network for Robust Video Feature Extraction
Guanxiong Sun
Chi Wang
Zhaoyu Zhang
Jiankang Deng
S. Zafeiriou
Yang Hua
ViT
17
4
0
04 Feb 2024
Visual Objectification in Films: Towards a New AI Task for Video
  Interpretation
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Julie Tores
L. Sassatelli
Hui-Yin Wu
Clement Bergman
Lea Andolfi
...
F. Precioso
Thierry Devars
Magali Guaresi
Virginie Julliard
Sarah Lecossais
38
2
0
24 Jan 2024
On the Efficacy of Text-Based Input Modalities for Action Anticipation
On the Efficacy of Text-Based Input Modalities for Action Anticipation
Apoorva Beedu
Karan Samel
Irfan Essa
53
2
0
23 Jan 2024
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action
  Recognition
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Boyuan Jiang
Jun Chen
Jianbiao Mei
Xingxing Zuo
Guang Dai
Jingdong Wang
Yong-Jin Liu
VLM
28
4
0
22 Jan 2024
Toward Robust Multimodal Learning using Multimodal Foundational Models
Toward Robust Multimodal Learning using Multimodal Foundational Models
Xianbing Zhao
Soujanya Poria
Xuejiao Li
Yixin Chen
Buzhou Tang
VLM
37
2
0
20 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
29
5
0
18 Jan 2024
Towards A Better Metric for Text-to-Video Generation
Towards A Better Metric for Text-to-Video Generation
Jay Zhangjie Wu
Guian Fang
Haoning Wu
Xintao Wang
Yixiao Ge
...
Rui Zhao
Weisi Lin
Wynne Hsu
Ying Shan
Mike Zheng Shou
VGen
37
34
0
15 Jan 2024
Text-Driven Traffic Anomaly Detection with Temporal High-Frequency
  Modeling in Driving Videos
Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
Rongqin Liang
Yuanman Li
Jiantao Zhou
Xia Li
42
6
0
07 Jan 2024
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency
Yuyang Yin
Dejia Xu
Zhangyang Wang
Yao-Min Zhao
Yunchao Wei
3DGS
54
72
0
28 Dec 2023
Open-Vocabulary Video Relation Extraction
Open-Vocabulary Video Relation Extraction
Wentao Tian
Zheng Wang
Yu Fu
Jingjing Chen
Lechao Cheng
25
2
0
25 Dec 2023
Video Recognition in Portrait Mode
Video Recognition in Portrait Mode
Mingfei Han
Linjie Yang
Xiaojie Jin
Jiashi Feng
Xiaojun Chang
Heng Wang
30
3
0
21 Dec 2023
TF-CLIP: Learning Text-free CLIP for Video-based Person
  Re-Identification
TF-CLIP: Learning Text-free CLIP for Video-based Person Re-Identification
Chenyang Yu
Xuehu Liu
Yingquan Wang
Pingping Zhang
Huchuan Lu
VLM
27
21
0
15 Dec 2023
EZ-CLIP: Efficient Zeroshot Video Action Recognition
EZ-CLIP: Efficient Zeroshot Video Action Recognition
Shahzad Ahmad
S. Chanda
Y. S. Rawat
VLM
28
7
0
13 Dec 2023
Generating Action-conditioned Prompts for Open-vocabulary Video Action
  Recognition
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition
Chengyou Jia
Minnan Luo
Xiaojun Chang
Zhuohang Dang
Mingfei Han
Mengmeng Wang
Guangwen Dai
Sizhe Dang
Jingdong Wang
VLM
26
4
0
04 Dec 2023
RTQ: Rethinking Video-language Understanding Based on Image-text Model
RTQ: Rethinking Video-language Understanding Based on Image-text Model
Xiao Wang
Yaoyu Li
Tian Gan
Zheng Zhang
Jingjing Lv
Liqiang Nie
19
6
0
01 Dec 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
CAST: Cross-Attention in Space and Time for Video Action Recognition
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
35
12
0
30 Nov 2023
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for
  General Video Recognition
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
23
8
0
30 Nov 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient
  Image-to-Video Transfer Learning
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
95
9
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for
  Generalizable Video Action Recognition
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
33
6
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video
  Understanding
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
31
3
0
25 Nov 2023
Hardware Resilience Properties of Text-Guided Image Classifiers
Hardware Resilience Properties of Text-Guided Image Classifiers
Syed Talal Wasim
Kabila Haile Soboka
Abdulrahman Mahmoud
Salman Khan
David Brooks
Gu-Yeon Wei
VLM
22
1
0
23 Nov 2023
Language-guided Few-shot Semantic Segmentation
Language-guided Few-shot Semantic Segmentation
Jing Wang
Yuang Liu
Qiang-feng Zhou
Fan Wang
VLM
22
3
0
23 Nov 2023
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Shehan Munasinghe
Rusiru Thushara
Muhammad Maaz
H. Rasheed
Salman Khan
Mubarak Shah
Fahad Khan
VLM
MLLM
27
34
0
22 Nov 2023
Breathing Life Into Sketches Using Text-to-Video Priors
Breathing Life Into Sketches Using Text-to-Video Priors
Rinon Gal
Yael Vinker
Yuval Alaluf
Amit H. Bermano
Daniel Cohen-Or
Ariel Shamir
Gal Chechik
VGen
DiffM
32
29
0
21 Nov 2023
Open-Vocabulary Video Anomaly Detection
Open-Vocabulary Video Anomaly Detection
Peng Wu
Xuerong Zhou
Guansong Pang
Yujia Sun
Jing Liu
Peng Wang
Yanning Zhang
VLM
32
22
0
13 Nov 2023
LabelFormer: Object Trajectory Refinement for Offboard Perception from
  LiDAR Point Clouds
LabelFormer: Object Trajectory Refinement for Offboard Perception from LiDAR Point Clouds
Anqi Joyce Yang
Sergio Casas
Nikita Dvornik
Sean Segal
Yuwen Xiong
Jordan Sir Kwang Hu
Carter Fang
R. Urtasun
39
6
0
02 Nov 2023
Videoprompter: an ensemble of foundational models for zero-shot video
  understanding
Videoprompter: an ensemble of foundational models for zero-shot video understanding
Adeel Yousaf
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Mubarak Shah
VLM
38
2
0
23 Oct 2023
RoboCLIP: One Demonstration is Enough to Learn Robot Policies
RoboCLIP: One Demonstration is Enough to Learn Robot Policies
S. Sontakke
Jesse Zhang
Sébastien M. R. Arnold
Karl Pertsch
Erdem Biyik
Dorsa Sadigh
Chelsea Finn
Laurent Itti
OffRL
27
66
0
11 Oct 2023
Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
Hao Zhang
Lumin Xu
Shenqi Lai
Wenqi Shao
Nanning Zheng
Ping Luo
Yu Qiao
Kaipeng Zhang
ObjD
VLM
27
8
0
08 Oct 2023
Building an Open-Vocabulary Video CLIP Model with Better Architectures,
  Optimization and Data
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu-Gang Jiang
CLIP
VLM
36
21
0
08 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to
  Video
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
35
8
0
02 Oct 2023
Telling Stories for Common Sense Zero-Shot Action Recognition
Telling Stories for Common Sense Zero-Shot Action Recognition
Shreyank N. Gowda
Carolina Scarton
LM&Ro
27
2
0
29 Sep 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction
  Tuning
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
25
28
0
27 Sep 2023
VideoAdviser: Video Knowledge Distillation for Multimodal Transfer
  Learning
VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning
Yanan Wang
Donghuo Zeng
Shinya Wada
Satoshi Kurihara
32
6
0
27 Sep 2023
Delving into Multimodal Prompting for Fine-grained Visual Classification
Delving into Multimodal Prompting for Fine-grained Visual Classification
Xin Jiang
Hao Tang
Junyao Gao
Xiaoyu Du
Shengfeng He
Zechao Li
VLM
29
22
0
16 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
  Transfer Learning
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
27
18
0
14 Sep 2023
Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative
  Scanpath Representation
Perceptual Quality Assessment of 360∘^\circ∘ Images Based on Generative Scanpath Representation
Xiangjie Sui
Hanwei Zhu
Xuelin Liu
Yuming Fang
Shiqi Wang
Zhou Wang
43
6
0
07 Sep 2023
ATM: Action Temporality Modeling for Video Question Answering
ATM: Action Temporality Modeling for Video Question Answering
Junwen Chen
Jie Zhu
Yu Kong
24
1
0
05 Sep 2023
Multimodal Contrastive Learning with Hard Negative Sampling for Human
  Activity Recognition
Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
Irfan Essa
SSL
23
3
0
03 Sep 2023
Expanding Frozen Vision-Language Models without Retraining: Towards
  Improved Robot Perception
Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception
Riley Tavassoli
Mani Amani
Reza Akhavian
33
1
0
31 Aug 2023
Robust Activity Recognition for Adaptive Worker-Robot Interaction using
  Transfer Learning
Robust Activity Recognition for Adaptive Worker-Robot Interaction using Transfer Learning
Farid Shahnavaz
Riley Tavassoli
Reza Akhavian
22
1
0
28 Aug 2023
Prompting Visual-Language Models for Dynamic Facial Expression
  Recognition
Prompting Visual-Language Models for Dynamic Facial Expression Recognition
Zengqun Zhao
Ioannis Patras
VLM
13
33
0
25 Aug 2023
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video
  Anomaly Detection
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Peng Wu
Xu Zhou
Guansong Pang
Lingru Zhou
Qingsen Yan
Peng Wang
Yanning Zhang
CLIP
VLM
21
67
0
22 Aug 2023
UnLoc: A Unified Framework for Video Localization Tasks
UnLoc: A Unified Framework for Video Localization Tasks
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
31
53
0
21 Aug 2023
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
Yan Zhu
Junbao Zhuo
B. Ma
Jiajia Geng
Xiaoming Wei
Xiaolin K. Wei
Shuhui Wang
VLM
25
5
0
14 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
35
9
0
10 Aug 2023
Cross-Domain Product Representation Learning for Rich-Content E-Commerce
Cross-Domain Product Representation Learning for Rich-Content E-Commerce
Xuehan Bai
Yan Li
Yong Cheng
Wenjie Yang
Quanming Chen
Han Li
19
3
0
10 Aug 2023
Previous
12345
Next