ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.02816
  4. Cited By
Expanding Language-Image Pretrained Models for General Video Recognition

Expanding Language-Image Pretrained Models for General Video Recognition

4 August 2022
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
    VLM
    CLIP
    ViT
ArXivPDFHTML

Papers citing "Expanding Language-Image Pretrained Models for General Video Recognition"

50 / 225 papers shown
Title
Fine-Tuning Video-Text Contrastive Model for Primate Behavior Retrieval from Unlabeled Raw Videos
Fine-Tuning Video-Text Contrastive Model for Primate Behavior Retrieval from Unlabeled Raw Videos
Giulio Cesare Mastrocinque Santo
Patrícia Izar
Irene Delval
Victor de Napole Gregolin
Nina S. T. Hirata
VGen
40
0
0
08 May 2025
AnimalMotionCLIP: Embedding motion in CLIP for Animal Behavior Analysis
AnimalMotionCLIP: Embedding motion in CLIP for Animal Behavior Analysis
Enmin Zhong
Carlos R. del-Blanco
Daniel Berjón
F. Jaureguizar
Narciso N. García
34
0
0
30 Apr 2025
ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task
ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task
Ahmad Khalil
Mahmoud Khalil
A. Ngom
VLM
42
1
0
20 Apr 2025
Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
Elisa Ancarani
Julie Tores
L. Sassatelli
Rémy Sun
Hui-Yin Wu
F. Precioso
29
0
0
15 Apr 2025
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Y. S. Rawat
SSL
194
0
0
08 Apr 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViT
SSL
52
0
0
08 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
41
0
0
02 Apr 2025
The HCI GenAI CO2ST Calculator: A Tool for Calculating the Carbon Footprint of Generative AI Use in Human-Computer Interaction Research
The HCI GenAI CO2ST Calculator: A Tool for Calculating the Carbon Footprint of Generative AI Use in Human-Computer Interaction Research
Nanna Inie
Jeanette Falk
Raghavendra Selvan
46
0
0
01 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
59
0
0
01 Apr 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
51
0
0
30 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Pengfei Zhu
Q. Hu
CLIP
48
0
0
24 Mar 2025
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun V. Reddy
Alexander Martin
Eugene Yang
Andrew Yates
Kate Sanders
Kenton W. Murray
Reno Kriz
Celso M. De Melo
Benjamin Van Durme
Rama Chellappa
50
1
0
24 Mar 2025
LATMOS: Latent Automaton Task Model from Observation Sequences
Weixiao Zhan
Qiyue Dong
Eduardo Sebastián
Nikolay Atanasov
53
0
0
11 Mar 2025
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion
Ziyi Yang
Fanqi Wan
Longguang Zhong
Canbin Huang
Guosheng Liang
Xiaojun Quan
MoMe
92
0
0
06 Mar 2025
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu
Congqi Cao
Yifan Zhang
Yanning Zhang
VLM
43
0
0
27 Feb 2025
Conformal Predictions for Human Action Recognition with Vision-Language Models
Conformal Predictions for Human Action Recognition with Vision-Language Models
Bary Tim
Fuchs Clément
Macq Benoît
VLM
51
0
0
10 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
Dawei Yin
59
1
0
09 Feb 2025
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
Aaron Lohner
Francesco Compagno
Jonathan M Francis
A. Oltramari
57
2
0
10 Jan 2025
Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems
Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems
Wen-Dong Jiang
Chih-Yung Chang
Hsiang-Chuan Chang
Ji-Yuan Chen
Diptendu Sinha Roy
38
0
0
31 Dec 2024
LEARN: A Unified Framework for Multi-Task Domain Adapt Few-Shot Learning
LEARN: A Unified Framework for Multi-Task Domain Adapt Few-Shot Learning
Bharadwaj Ravichandran
Alexander Lynch
S. Brockman
Brandon RichardWebster
Dawei Du
A. Hoogs
Christopher Funk
ObjD
VLM
70
0
0
20 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
74
0
0
18 Dec 2024
Foundation Models and Adaptive Feature Selection: A Synergistic Approach
  to Video Question Answering
Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering
Sai Bhargav Rongali
M. Cui
Ankit Jha
Neha Bhargava
Saurabh Prasad
Biplab Banerjee
79
0
0
12 Dec 2024
Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal
  Consistency and Rigidity Constraints
Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints
Gaurav Rai
Ojaswa Sharma
VGen
DiffM
69
3
0
28 Nov 2024
ACE: Action Concept Enhancement of Video-Language Models in Procedural
  Videos
ACE: Action Concept Enhancement of Video-Language Models in Procedural Videos
Reza Ghoddoosian
Nakul Agarwal
Isht Dwivedi
Behzad Darisuh
68
0
0
23 Nov 2024
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic
  Surgical Video-Language Pretraining
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu
Kun Yuan
Yaling Shen
Feilong Tang
Xiaohao Xu
...
Jin Ye
N. Padoy
Nassir Navab
Junjun He
Zongyuan Ge
VLM
CLIP
95
11
0
23 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
98
0
0
20 Nov 2024
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
54
0
0
18 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Wentao Bao
K. Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
42
2
0
17 Nov 2024
GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot
  Anomaly Detection
GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection
Jiyul Ham
Yonggon Jung
Jun-Geol Baek
VLM
41
1
0
09 Nov 2024
ESC-MISR: Enhancing Spatial Correlations for Multi-Image
  Super-Resolution in Remote Sensing
ESC-MISR: Enhancing Spatial Correlations for Multi-Image Super-Resolution in Remote Sensing
Zhihui Zhang
Jinhui Pang
Jianan Li
Xiaoshuai Hao
30
0
0
07 Nov 2024
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
F. Khan
Salman Khan
MLLM
VGen
VLM
44
6
0
07 Nov 2024
Pseudo-labeling with Keyword Refining for Few-Supervised Video
  Captioning
Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning
Ping Li
Tao Wang
Xinkui Zhao
Xianghua Xu
Mingli Song
34
3
0
06 Nov 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
39
0
0
04 Nov 2024
Storyboard guided Alignment for Fine-grained Video Action Recognition
Storyboard guided Alignment for Fine-grained Video Action Recognition
Enqi Liu
Liyuan Pan
Yan Yang
Yiran Zhong
Zhijing Wu
Xinxiao Wu
Liu Liu
33
0
0
18 Oct 2024
SDS -- See it, Do it, Sorted: Quadruped Skill Synthesis from Single
  Video Demonstration
SDS -- See it, Do it, Sorted: Quadruped Skill Synthesis from Single Video Demonstration
Jeffrey Li
Maria Stamatopoulou
Dimitrios Kanoulas
20
1
0
15 Oct 2024
When Does Perceptual Alignment Benefit Vision Representations?
When Does Perceptual Alignment Benefit Vision Representations?
Shobhita Sundaram
Stephanie Fu
Lukas Muttenthaler
Netanel Y. Tamir
Lucy Chai
Simon Kornblith
Trevor Darrell
Phillip Isola
49
6
1
14 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language
  to Video Knowledge Transfer
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
37
1
0
14 Oct 2024
Continual Learning Improves Zero-Shot Action Recognition
Continual Learning Improves Zero-Shot Action Recognition
Shreyank N. Gowda
Davide Moltisanti
Laura Sevilla-Lara
BDL
VLM
CLL
29
1
0
14 Oct 2024
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large
  Language and Vision Models
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models
Juseong Jin
Chang Wook Jeong
27
3
0
13 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Enhancing Temporal Modeling of Video LLMs via Time Gating
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
M. Lyu
Liwei Wang
VLM
28
0
0
08 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
MaskedMimic: Unified Physics-Based Character Control Through Masked
  Motion Inpainting
MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting
Chen Tessler
Yunrong Guo
Ofir Nabati
Gal Chechik
Xue Bin Peng
VGen
AI4CE
34
30
0
22 Sep 2024
Adaptive Robot Perception in Construction Environments using 4D BIM
Adaptive Robot Perception in Construction Environments using 4D BIM
Mani Amani
Reza Akhavian
AI4CE
26
1
0
20 Sep 2024
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
Yongqi Wang
Xinxiao Wu
Shuo Yang
Jiebo Luo
134
1
0
19 Sep 2024
From Experts to the Public: Governing Multimodal Language Models in
  Politically Sensitive Video Analysis
From Experts to the Public: Governing Multimodal Language Models in Politically Sensitive Video Analysis
Tanusree Sharma
Yujin Potter
Zachary Kilhoffer
Yun Huang
Dawn Song
Yang Wang
51
3
0
15 Sep 2024
Text-Enhanced Zero-Shot Action Recognition: A training-free approach
Text-Enhanced Zero-Shot Action Recognition: A training-free approach
Massimo Bosetti
Shibingfeng Zhang
Bendetta Liberatori
Giacomo Zara
Elisa Ricci
Paolo Rota
VLM
49
0
0
29 Aug 2024
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Wei-Jhe Huang
Min-Hung Chen
Shang-Hong Lai
32
0
0
28 Aug 2024
Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action
  Recognition
Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition
Bozheng Li
Mushui Liu
Gaoang Wang
Yunlong Yu
33
5
0
22 Aug 2024
Audio Description Customization
Audio Description Customization
Rosiana Natalie
Ruei-Che Chang
Smitha Sheshadri
Anhong Guo
Kotaro Hara
24
4
0
21 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
29
1
0
13 Aug 2024
12345
Next