Title
Fully Transformer-Equipped Architecture for End-to-End Referring Video Object Segmentation P. Li Yu Zhang L. Yuan Xianghua Xu VOS 29 6 0 21 Sep 2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models Yuan Tseng Layne Berry Yi-Ting Chen I-Hsiang Chiu Hsuan-Hao Lin ... Yu Tsao Shinji Watanabe Abdel-rahman Mohamed Chi-Luen Feng Hung-yi Lee VLM SSL 66 14 0 19 Sep 2023
Collaborative Three-Stream Transformers for Video Captioning Hao Wang Libo Zhang Hengrui Fan Tiejian Luo 41 6 0 18 Sep 2023
AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder Xingjian Diao Ming Cheng Shitong Cheng VGen 32 8 0 15 Sep 2023
Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval Rui Deng Qian Wu Yuke Li Haoran Fu 26 2 0 15 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning Zhiwu Qing Shiwei Zhang Ziyuan Huang Yingya Zhang Changxin Gao Deli Zhao Nong Sang 37 18 0 14 Sep 2023
Generative Image Dynamics Zhengqi Li Richard Tucker Noah Snavely Aleksander Holynski DiffM 48 63 0 14 Sep 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning Palaash Agrawal Haidi Azaman Cheston Tan 56 3 0 13 Sep 2023
Enhancing multimodal cooperation via sample-level modality valuation Yake Wei Ruoxuan Feng Zihe Wang Di Hu 38 11 0 12 Sep 2023
JOADAA: joint online action detection and action anticipation Mohammed Guermal François Brémond Rui Dai Abid Ali 37 6 0 12 Sep 2023
Can we predict the Most Replayed data of video streaming platforms? Alessandro Duico Ombretta Strafforello Jan van Gemert 24 1 0 12 Sep 2023
SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition Cong Wu Xiaojun Wu Josef Kittler Tianyang Xu Sara Atito Muhammad Awais Zhenhua Feng 43 3 0 11 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture Meng Cui Xubo Liu Haohe Liu Zhuangzhuang Du Tao Chen Guoping Lian Daoliang Li Wenwu Wang 34 5 0 10 Sep 2023
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion Yujin Jeong Won-Wha Ryoo Seunghyun Lee Dabin Seo Wonmin Byeon Sangpil Kim Jinkyu Kim DiffM 32 29 0 08 Sep 2023
CDFSL-V: Cross-Domain Few-Shot Learning for Videos Sarinda Samarasinghe Mamshad Nayeem Rizve Navid Kardan M. Shah 27 11 0 07 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding Yue Xu Yong-Lu Li Zhemin Huang Michael Xu Liu Cewu Lu Yu-Wing Tai Chi-Keung Tang EgoV 33 9 0 05 Sep 2023
AAN: Attributes-Aware Network for Temporal Action Detection Rui Dai Srijan Das Michael S. Ryoo François Brémond 32 4 0 01 Sep 2023
Towards Contrastive Learning in Music Video Domain Karel Veldkamp Mariya Hendriksen Zoltán Szlávik Alexander Keijser SSL 37 2 0 01 Sep 2023
RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability Chuning Zhu Max Simchowitz Siri Gadipudi Abhishek Gupta 46 13 0 31 Aug 2023
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction Umar Khalid Hasan Iqbal Saeed Vahidian Jing Hua Chong Chen 21 3 0 29 Aug 2023
Evaluation of Key Spatiotemporal Learners for Print Track Anomaly Classification Using Melt Pool Image Streams Lynn Cherif Mutahar Safdar Guy Lamouche P. Wanjara P. Paul G. Wood Max Zimmermann F. Hannesen Yao Zhao 36 1 0 28 Aug 2023
Learning to Read Analog Gauges from Synthetic Data Juan Carlos León Alcázar Yazeed Alnumay Cheng Zheng Hassane Trigui Sahejad Patel Guohao Li 11 3 0 28 Aug 2023
Improving Video Violence Recognition with Human Interaction Learning on 3D Skeleton Point Clouds Yukun Su Guosheng Lin Qingyao Wu 3DH 3DPC 29 3 0 26 Aug 2023
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers Matthew Dutson Yin Li M. Gupta ViT 45 8 0 25 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning D. Fan Jue Wang Shuai Liao Yi Zhu Vimal Bhat H. Santos-Villalobos M. Rohith Xinyu Li VGen 37 19 0 24 Aug 2023
An All Deep System for Badminton Game Analysis Po-Yung Chou Yu-Chun Lo Bo Xie Chu-Hsing Lin Yu-Yung Kao 17 0 0 24 Aug 2023
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos Ziyuan Yang Sucheng Ren Zongwei Wu Nanxuan Zhao Junle Wang Jing Qin Shengfeng He 41 2 0 23 Aug 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation Hejun Xiao Kunyu Peng Xiangsheng Huang Alina Roitberg Hao Li Zhao Wang Rainer Stiefelhagen 28 3 0 23 Aug 2023
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization Emanuele Bugliarello Hernan Moraldo Ruben Villegas Mohammad Babaeizadeh M. Saffar Han Zhang D. Erhan V. Ferrari Pieter-Jan Kindermans P. Voigtlaender VGen 41 10 0 22 Aug 2023
Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition Qitong Wang Long Zhao Liangzhe Yuan Ting Liu Xi Peng 36 12 0 22 Aug 2023
Audio-Visual Class-Incremental Learning Weiguo Pian Shentong Mo Yunhui Guo Yapeng Tian CLL VLM 33 28 0 21 Aug 2023
TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection Joe Fioresi I. Dave M. Shah 43 18 0 21 Aug 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs Fangyun Wei Yutong Chen SLR 33 28 0 21 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding Bingkun Huang Zhiyu Zhao Guozhen Zhang Yu Qiao Limin Wang 44 30 0 21 Aug 2023
Self-Feedback DETR for Temporal Action Detection Jihwan Kim Miso Lee Jae-Pil Heo 53 18 0 21 Aug 2023
Joint learning of images and videos with a single Vision Transformer Shuki Shimizu Toru Tamaki ViT 24 0 0 21 Aug 2023
Learnt Contrastive Concept Embeddings for Sign Recognition Ryan Wong Necati Cihan Camgöz Richard Bowden 29 5 0 18 Aug 2023
Audio-Visual Glance Network for Efficient Video Recognition Muhammad Adi Nugroho Sangmin Woo Sumin Lee Changick Kim 24 5 0 18 Aug 2023
The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation Giacomo Zara Alessandro Conti Subhankar Roy Stéphane Lathuilière Paolo Rota Elisa Ricci 33 11 0 17 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding Jiahao Wang Guo Chen Yifei Huang Liming Wang Tong Lu OffRL 62 37 0 15 Aug 2023
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation Hong Li Xingyu Li Pengbo Hu Yinuo Lei Chunxiao Li Yi Zhou 49 22 0 15 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding Ziyuan Huang Shiwei Zhang Liang Pan Zhiwu Qing Yingya Zhang Ziwei Liu Marcelo H. Ang 46 9 0 10 Aug 2023
PAT: Position-Aware Transformer for Dense Multi-Label Action Detection Faegheh Sardari A. Mustafa Philip J. B. Jackson A. Hilton ViT 27 6 0 09 Aug 2023
JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition L. Bicsi B. Alexe Radu Tudor Ionescu Marius Leordeanu 22 2 0 09 Aug 2023
Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction Izzeddin Teeti Rongali Sai Bhargav Vivek Singh Andrew Bradley Biplab Banerjee Fabio Cuzzolin 19 1 0 08 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation Shuangrui Ding Peisen Zhao Xiaopeng Zhang Rui Qian H. Xiong Qi Tian ViT 29 16 0 08 Aug 2023
A Survey on Deep Learning-based Spatio-temporal Action Detection Peng Wang Fanwei Zeng Yu Qian 34 5 0 03 Aug 2023
TS-RGBD Dataset: a Novel Dataset for Theatre Scenes Description for People with Visual Impairments Leyla Benhamida Khadidja Delloul S. Larabi 16 1 0 02 Aug 2023
Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment Hongbo Liu Ming-Kun Wu Kun Yuan Ming-Ting Sun Yansong Tang Chuanchuan Zheng Xingsen Wen Xiu Li 47 17 0 01 Aug 2023
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment Kun Yuan Zishang Kong Chuanchuan Zheng Ming-Ting Sun Xingsen Wen ViT 32 14 0 31 Jul 2023