Title
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval Ning Han Jingjing Chen Chuhao Shi Yawen Zeng Guangyi Xiao Hao Chen 33 10 0 29 Oct 2021
Skeleton-Based Mutually Assisted Interacted Object Localization and Human Action Recognition Liang Xu Cuiling Lan Wenjun Zeng Cewu Lu 35 24 0 28 Oct 2021
DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations Fei Deng Ingook Jang Sungjin Ahn VLM 40 62 0 27 Oct 2021
Zero-Shot Action Recognition from Diverse Object-Scene Compositions Carlo Bretti Pascal Mettes OCL 11 9 0 26 Oct 2021
Self-Supervised Knowledge Transfer via Loosely Supervised Auxiliary Tasks Seungbum Hong Jihun Yoon Junmo Kim Min-Kook Choi SSL 21 1 0 25 Oct 2021
Using Motion History Images with 3D Convolutional Networks in Isolated Sign Language Recognition Hamed Valizadegan D. Caldwell SLR 37 48 0 24 Oct 2021
A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark Zhenxi Zhu Limin Wang Sheng Guo Gangshan Wu 56 32 0 24 Oct 2021
Rethinking Generalization Performance of Surgical Phase Recognition with Expert-Generated Annotations Seungbum Hong Jiwon Lee Bokyung Park Ahmed A. Alwusaibie Anwar H. Alfadhel Sunghyun Park W. Hyung Min-Kook Choi 27 2 0 22 Oct 2021
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation Khoa T. Vo Kevin Hyekang Joo Kashu Yamazaki Sang Truong Kris Kitani Minh-Triet Tran Ngan Le EgoV 73 17 0 21 Oct 2021
Video and Text Matching with Conditioned Embeddings Ameen Ali Idan Schwartz Tamir Hazan Lior Wolf 100 13 0 21 Oct 2021
Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection Shraman Pramanick A. Roy Vishal M. Patel 40 57 0 21 Oct 2021
Few-Shot Temporal Action Localization with Query Adaptive Transformer Sauradip Nag Xiatian Zhu Tao Xiang 11 19 0 20 Oct 2021
GTM: Gray Temporal Model for Video Recognition Yanping Zhang Yongxin Yu 33 0 0 20 Oct 2021
Self-Supervised Representation Learning: Introduction, Advances and Challenges Linus Ericsson Henry Gouk Chen Change Loy Timothy M. Hospedales SSL OOD AI4TS 42 275 0 18 Oct 2021
Boosting the Transferability of Video Adversarial Examples via Temporal Translation Zhipeng Wei Jingjing Chen Zuxuan Wu Yu-Gang Jiang AAML 43 32 0 18 Oct 2021
Visual-aware Attention Dual-stream Decoder for Video Captioning Zhixin Sun Xian Zhong Shuqin Chen Lin Li Luo Zhong 38 3 0 16 Oct 2021
"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021 Ishan R. Dave Naman Biyani Brandon Clark Rohit Gupta Yogesh S Rawat M. Shah ViT 40 3 0 14 Oct 2021
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks Sangeeta Srivastava Yun Wang Andros Tjandra Anurag Kumar Chunxi Liu Kritika Singh Yatharth Saraf SSL 38 24 0 14 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 284 1,037 0 13 Oct 2021
Object-Region Video Transformers Roei Herzig Elad Ben-Avraham K. Mangalam Amir Bar Gal Chechik Anna Rohrbach Trevor Darrell Amir Globerson ViT 43 82 0 13 Oct 2021
TAda! Temporally-Adaptive Convolutions for Video Understanding Ziyuan Huang Shiwei Zhang Liang Pan Zhiwu Qing Mingqian Tang Ziwei Liu M. Ang 53 49 0 12 Oct 2021
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition M. C. Leong Hui Li Tan Haosong Zhang Liyuan Li Feng Lin J. Lim 51 10 0 12 Oct 2021
Action-Sufficient State Representation Learning for Control with Structural Constraints Erdun Gao Chaochao Lu Liu Leqi José Miguel Hernández-Lobato Clark Glymour Bernhard Schölkopf Kun Zhang 59 32 0 12 Oct 2021
A Multi-viewpoint Outdoor Dataset for Human Action Recognition Asanka G. Perera Yee Wei Law T. Ogunwa J. Chahl 28 40 0 07 Oct 2021
Scaling up instance annotation via label propagation Dim P. Papadopoulos Ethan Weber Antonio Torralba ISeg 51 10 0 05 Oct 2021
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction Rishubh Parihar Gaurav Ramola Ranajit Saha Raviprasad Kini Aniket Rege S. Velusamy 36 1 0 03 Oct 2021
Disarranged Zone Learning (DZL): An unsupervised and dynamic automatic stenosis recognition methodology based on coronary angiography Yanan Dai P. Zhu Bangde Xue Yun Ling Xibao Shi Liang Geng Qi Zhang Jun Liu 6 0 0 03 Oct 2021
CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation Tianlin Li Jingen Liu Tao Mei Jiebo Luo 24 7 0 30 Sep 2021
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device Ji Lin Chuang Gan Kuan-Chieh Wang Song Han 45 64 0 27 Sep 2021
Self-Supervised Video Representation Learning by Video Incoherence Detection Haozhi Cao Yuecong Xu Jianfei Yang K. Mao Lihua Xie Jianxiong Yin Simon See SSL 33 6 0 26 Sep 2021
Multi-Source Video Domain Adaptation with Temporal Attentive Moment Alignment Yuecong Xu Jianfei Yang Haozhi Cao Keyu Wu Min-man Wu Rui Zhao Zhenghua Chen TTA 45 22 0 21 Sep 2021
Survey: Transformer based Video-Language Pre-training Ludan Ruan Qin Jin VLM ViT 72 44 0 21 Sep 2021
Towards High-Quality Temporal Action Detection with Sparse Proposals Jiannan Wu Pei Sun Shoufa Chen Jiewen Yang Zihao Qi Lan Ma Ping Luo ViT 43 10 0 18 Sep 2021
Unsupervised View-Invariant Human Posture Representation Faegheh Sardari Bjorn Ommer Majid Mirmehdi 3DH 42 3 0 17 Sep 2021
MovieCuts: A New Dataset and Benchmark for Cut Type Recognition Alejandro Pardo Fabian Caba Heilbron Juan Carlos León Alcázar Ali K. Thabet Guohao Li VGen 47 28 0 12 Sep 2021
A Survey on Multi-modal Summarization Anubhav Jangra Sourajit Mukherjee Adam Jatowt S. Saha M. Hasanuzzaman 44 60 0 11 Sep 2021
Self Supervision to Distillation for Long-Tailed Visual Recognition Tianhao Li Limin Wang Gangshan Wu 45 102 0 09 Sep 2021
Simple Video Generation using Neural ODEs David Kanaa Vikram S. Voleti Samira Ebrahimi Kahou Christopher Pal 27 20 0 07 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization Tiezheng Yu Wenliang Dai Zihan Liu Pascale Fung 37 73 0 06 Sep 2021
Efficient Action Recognition Using Confidence Distillation Shervin Manzuri Shalmani Fei Chiang Ronghuo Zheng 27 6 0 05 Sep 2021
Revisiting 3D ResNets for Video Recognition Xianzhi Du Yeqing Li Huayu Chen Rui Qian Jing Li Irwan Bello 59 17 0 03 Sep 2021
Hierarchical 3D Feature Learning for Pancreas Segmentation Federica Proietto Salanitri Giovanni Bellitto Ismail Irmakci S. Palazzo Ulas Bagci C. Spampinato MedIm 25 10 0 03 Sep 2021
Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition James Hong Matthew Fisher Michael Gharbi Kayvon Fatahalian 3DH 32 37 0 03 Sep 2021
Searching for Two-Stream Models in Multivariate Space for Video Recognition Xinyu Gong Heng Wang Zheng Shou Matt Feiszli Zhangyang Wang Zhicheng Yan 47 9 0 30 Aug 2021
Spatio-Temporal Dynamic Inference Network for Group Activity Recognition Hangjie Yuan Dong Ni Mang Wang AI4CE 17 80 0 26 Aug 2021
Shifted Chunk Transformer for Spatio-Temporal Representational Learning Xuefan Zha Wentao Zhu Tingxun Lv Sen Yang Ji Liu AI4TS ViT 33 27 0 26 Aug 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition Jiawei Chen C. Ho ViT 29 77 0 20 Aug 2021
Weakly-supervised Joint Anomaly Detection and Classification Snehashis Majhi Srijan Das Francois Bremond Ratnakar Dash Pankaj K. Sa 19 20 0 20 Aug 2021
Multi-Object Tracking with Hallucinated and Unlabeled Videos Daniel McKee Bing Shuai Andrew G. Berneshawi Manchen Wang Davide Modolo Svetlana Lazebnik Joseph Tighe VOT 19 7 0 19 Aug 2021
Blindly Assess Quality of In-the-Wild Videos via Quality-aware Pre-training and Motion Perception Bowen Li Weixia Zhang Meng Tian Guangtao Zhai Xianpei Wang 43 120 0 19 Aug 2021