Title
Impossible Videos Zechen Bai Hai Ci Mike Zheng Shou EGVM VGen 72 0 0 18 Mar 2025
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition Otto Brookes Maksim Kukushkin Majid Mirmehdi Colleen Stephens Paula Dieguez ... Lukas Boesch Thomas Schmid M. Arandjelovic H. Kühl T. Burghardt 48 0 0 28 Feb 2025
Can masking background and object reduce static bias for zero-shot action recognition? Takumi Fukuzawa Kensho Hara Hirokatsu Kataoka Toru Tamaki 43 0 0 22 Jan 2025
Dynamic Scene Understanding from Vision-Language Representations Shahaf Pruss Morris Alper Hadar Averbuch-Elor OCL 164 0 0 20 Jan 2025
SEMU-Net: A Segmentation-based Corrector for Fabrication Process Variations of Nanophotonics with Microscopic Images Rambod Azimi Yijian Kong D. Gostimirovic James J. Clark O. Liboiron-Ladouceur 62 0 0 25 Nov 2024
STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-based Video Models Zerui Wang Yan Liu 50 0 0 01 Nov 2024
Low-Latency Video Anonymization for Crowd Anomaly Detection: Privacy vs. Performance M. Asres Lei Jiao C. Omlin 31 0 0 24 Oct 2024
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling Hao Wu Donglin Bai Shiqi Jiang Qianxi Zhang Y. Yang Ting Cao Fengyuan Xu Yunxin Liu Fengyuan Xu 142 0 0 19 Oct 2024
MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations Liang Xu Shaoyang Hua Zili Lin Yifan Liu Feipeng Ma Yichao Yan Xin Jin Xiaokang Yang Wenjun Zeng VGen 39 3 0 17 Oct 2024
One missing piece in Vision and Language: A Survey on Comics Understanding Emanuele Vivoli Andrey Barsky Mohamed Ali Souibgui Artemis LLabres Marco Bertini Dimosthenis Karatzas 34 3 0 14 Sep 2024
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition Ahmed Abdelkawy Asem A. Ali Aly A. Farag 3DPC 26 0 0 10 Aug 2024
SignCLIP: Connecting Text and Sign Language by Contrastive Learning Zifan Jiang Gerard Sant Amit Moryossef Mathias Müller Rico Sennrich Sarah Ebling VLM CLIP 34 2 0 01 Jul 2024
SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks Yi Pan Jun-Jie Huang Zihan Chen Wentao Zhao Ziyue Wang 28 0 0 04 Jun 2024
ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos Sharana Dharshikgan Suresh Dass H. Barua Ganesh Krishnasamy Raveendran Paramesran Raphael C.-W. Phan ViT 23 2 0 09 Apr 2024
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling W. G. C. Bandara Vishal M. Patel VPVLM VLM 28 1 0 11 Mar 2024
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming Pengyuan Zhou Lin Wang Zhi Liu Yanbin Hao Pan Hui Sasu Tarkoma J. Kangasharju VGen 38 26 0 30 Jan 2024
Multi-model learning by sequential reading of untrimmed videos for action recognition Kodai Kamiya Toru Tamaki 23 0 0 26 Jan 2024
Video Understanding with Large Language Models: A Survey Yunlong Tang Jing Bi Siting Xu Luchuan Song Susan Liang ... Feng Zheng Jianguo Zhang Ping Luo Jiebo Luo Chenliang Xu VLM 50 82 0 29 Dec 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition Jiaming Zhou Hanjun Li Kun-Yu Lin Junwei Liang 21 1 0 28 Nov 2023
Student Classroom Behavior Detection based on Spatio-Temporal Network and Multi-Model Fusion Fan Yang Xiaofei Wang 24 1 0 25 Oct 2023
Proving the Potential of Skeleton Based Action Recognition to Automate the Analysis of Manual Processes Marlin Berger F. Cloppenburg Jens Eufinger Thomas Gries 18 0 0 12 Oct 2023
Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination Siyuan Jiang Yan Ding Yuling Wang Lei Xu Wenli Dai ... Jie Yu Jianqiao Zhou Chunquan Zhang Ping Liang Dexing Kong 11 0 0 10 Oct 2023
SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior Fan Yang Tao Wang 18 17 0 04 Oct 2023
Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning Sikiru Adewale Tosin Ige Bolanle Hafiz Matti VLM 9 4 0 02 Oct 2023
TransNet: A Transfer Learning-Based Network for Human Action Recognition Khaled Alomar Xiaohao Cai 27 1 0 13 Sep 2023
Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos Sarthak Bhagat Simon Stepputtis Joseph Campbell Katia P. Sycara 31 4 0 12 Sep 2023
Joint learning of images and videos with a single Vision Transformer Shuki Shimizu Toru Tamaki ViT 11 0 0 21 Aug 2023
The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation Giacomo Zara Alessandro Conti Subhankar Roy Stéphane Lathuilière Paolo Rota Elisa Ricci 25 11 0 17 Aug 2023
E2E-LOAD: End-to-End Long-form Online Action Detection Shuyuan Cao Weihua Luo Bairui Wang Wei Emma Zhang Lin Ma 25 5 0 13 Jun 2023
Student Classroom Behavior Detection based on Improved YOLOv7 Fan Yang 11 6 0 06 Jun 2023
Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model Fusion Fan Yang Tao Wang Xiaofei Wang 6 13 0 13 May 2023
Learning Human-Human Interactions in Images from Weak Textual Supervision Morris Alper Hadar Averbuch-Elor VLM 37 2 0 27 Apr 2023
Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis Mindi Ruan Xiang Yu Naifeng Zhang Chuanbo Hu Shuo Wang Xin Li 28 8 0 20 Apr 2023
SCB-dataset: A Dataset for Detecting Student Classroom Behavior Yang Fan 9 10 0 05 Apr 2023
AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation Giacomo Zara Subhankar Roy Paolo Rota Elisa Ricci VLM 19 12 0 03 Apr 2023
Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos D. Fan De-Yun Yang Xinyu Li Vimal Bhat M. Rohith SSL 15 1 0 13 Mar 2023
AIM: Adapting Image Models for Efficient Video Action Recognition Taojiannan Yang Yi Zhu Yusheng Xie Aston Zhang C. L. P. Chen Mu Li ViT 44 144 0 06 Feb 2023
Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional Networks Leonard Hacker Finn Bartels Pierre-Etienne Martin 16 6 0 06 Feb 2023
Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using a New Frame Selection Policy and Gating Mechanism Nikolaos Gkalelis Dimitrios Daskalakis Vasileios Mezaris 13 4 0 18 Jan 2023
CNN-Based Action Recognition and Pose Estimation for Classifying Animal Behavior from Videos: A Survey Michael Perez Corey Toler-Franklin MedIm 28 14 0 15 Jan 2023
Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus on Videos Xingxing Wei Songping Wang Huanqian Yan AAML 21 15 0 03 Jan 2023
Transformers in Action Recognition: A Review on Temporal Modeling Elham Shabaninia Hossein Nezamabadi-pour Fatemeh Shafizadegan ViT 21 8 0 29 Dec 2022
Deep set conditioned latent representations for action recognition Akash Singh Tom De Schepper Kevin Mets P. Hellinckx José Oramas Steven Latré BDL 6 2 0 21 Dec 2022
Egocentric Video Task Translation Zihui Xue Yale Song Kristen Grauman Lorenzo Torresani EgoV 21 13 0 13 Dec 2022
Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks M. Kowal Mennatullah Siam Md. Amirul Islam Neil D. B. Bruce Richard P. Wildes Konstantinos G. Derpanis FAtt 17 3 0 03 Nov 2022
End-to-end Transformer for Compressed Video Quality Enhancement Li Yu Wenshuai Chang Shiyu Wu M. Gabbouj ViT 19 8 0 25 Oct 2022
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation Xueliang Zhao Yuxuan Wang Chongyang Tao Chenshuo Wang Dongyan Zhao 41 6 0 22 Oct 2022
Physical Adversarial Attack meets Computer Vision: A Decade Survey Hui Wei Hao Tang Xuemei Jia Zhixiang Wang Han-Bing Yu Zhubo Li Shiníchi Satoh Luc Van Gool Zheng Wang AAML 27 43 0 30 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks Junke Wang Dongdong Chen Zuxuan Wu Chong Luo Luowei Zhou Yucheng Zhao Yujia Xie Ce Liu Yu-Gang Jiang Lu Yuan MLLM VLM 30 148 0 15 Sep 2022
Active Learning with Effective Scoring Functions for Semi-Supervised Temporal Action Localization Ding Li Xuebing Yang Yongqiang Tang Chenyang Zhang Wensheng Zhang 29 4 0 31 Aug 2022