v1v2v3 (latest)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 3,647 papers shown

Title
Spatial-temporal Concept based Explanation of 3D ConvNets Yi Ji Yu Wang K. Mori Jien Kato 3DPC FAtt 92 7 0 09 Jun 2022
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation Zihan Ding Tianrui Hui Junshi Huang Xiaoming Wei Jizhong Han Si Liu VOS 73 55 0 08 Jun 2022
Generating Long Videos of Dynamic Scenes Tim Brooks Janne Hellsten M. Aittala Ting-Chun Wang Timo Aila J. Lehtinen Xuan Li Alexei A. Efros Tero Karras SyDa 104 114 0 07 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning Jie Lei Tamara L. Berg Joey Tianyi Zhou 96 115 0 07 Jun 2022
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector Lin Sui Chen-Da Liu-Zhang Lixin Gu Feng Han 143 8 0 07 Jun 2022
TadML: A fast temporal action detection with Mechanics-MLP Bowen Deng Dongchang Liu 83 1 0 07 Jun 2022
A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information M. Kowal Mennatullah Siam Md. Amirul Islam Neil D. B. Bruce Richard P. Wildes Konstantinos G. Derpanis 70 26 0 06 Jun 2022
3D Convolutional with Attention for Action Recognition Labina Shrestha Shikha Dubey Farrukh Olimov M. Rafique M. Jeon 38 0 0 05 Jun 2022
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval Xudong Lin Simran Tiwari Shiyuan Huang Manling Li Mike Zheng Shou Heng Ji Shih-Fu Chang 138 21 0 05 Jun 2022
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation Mingjie Li Wenjia Cai Karin Verspoor Shirui Pan Xiaodan Liang Xiaojun Chang MedIm 88 38 0 04 Jun 2022
Revisiting the "Video" in Video-Language Understanding S. Buch Cristobal Eyzaguirre Adrien Gaidon Jiajun Wu L. Fei-Fei Juan Carlos Niebles 102 166 0 03 Jun 2022
Egocentric Video-Language Pretraining Kevin Qinghong Lin Alex Jinpeng Wang Mattia Soldan Michael Wray Rui Yan ... Hongfa Wang Dima Damen Guohao Li Wei Liu Mike Zheng Shou VLM EgoV 104 207 0 03 Jun 2022
Anomaly detection in surveillance videos using transformer based attention model Kapil Deshpande Narinder Singh Punn S. K. Sonbhadra Sonali Agarwal ViT AI4TS 74 12 0 03 Jun 2022
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives Jun Li Junyu Chen Yucheng Tang Ce Wang Bennett A. Landman S. K. Zhou ViT OOD MedIm 181 46 0 02 Jun 2022
A temporal chrominance trigger for clean-label backdoor attack against anti-spoof rebroadcast detection Wei Guo B. Tondi Mauro Barni AAML 66 13 0 02 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications Fei Wu Qingzhong Wang Jian Bian Haoyi Xiong Ning Ding Feixiang Lu Junqing Cheng Dejing Dou AI4TS 95 57 0 02 Jun 2022
Cascaded Video Generation for Videos In-the-Wild Lluis Castrejon Nicolas Ballas Aaron Courville VGen 86 0 0 01 Jun 2022
Dual-stream spatiotemporal networks with feature sharing for monitoring animals in the home cage Ezechukwu I. Nwokedi R. Bains L. Bidaut Xujiong Ye Sara Wells James M. Brown 79 2 0 01 Jun 2022
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering Jiangtong Li Li Niu Liqing Zhang 67 53 0 30 May 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers Wenyi Hong Ming Ding Wendi Zheng Xinghan Liu Jie Tang DiffM 389 633 0 29 May 2022
Micro-Expression Recognition Based on Attribute Information Embedding and Cross-modal Contrastive Learning Yanxing Song Jianzong Wang Tianbo Wu Zhangcheng Huang Jing Xiao CVBM 131 2 0 29 May 2022
Future Transformer for Long-term Action Anticipation Dayoung Gong Joonseok Lee Manjin Kim S. Ha Minsu Cho AI4TS 53 66 0 27 May 2022
PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences Hehe Fan Xin Yu Yuhang Ding Yi Yang Mohan Kankanhalli 3DPC 190 113 0 27 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition Shoufa Chen Chongjian Ge Zhan Tong Jiangliu Wang Yibing Song Jue Wang Ping Luo 259 706 0 26 May 2022
Do we really need temporal convolutions in action segmentation? Dazhao Du Fuchun Sun Yu Li Zhongang Qi Hui Xiong Ying Shan ViT 72 17 0 26 May 2022
Learning What and Where: Disentangling Location and Identity Tracking Without Supervision Manuel Traub S. Otte Tobias Menge Matthias Karlbauer Jannik Thummel Martin Volker Butz 115 20 0 26 May 2022
VIDI: A Video Dataset of Incidents Duygu Sesver Alp Eren Gençoglu Ç. Yildiz Zehra Günindi Faeze Habibi Z. A. Yazici H. K. Ekenel 68 4 0 26 May 2022
You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos Xin Sun Xinyu Wang Jialin Gao Qiong Liu Xiaoping Zhou 96 34 0 25 May 2022
Detection of Fights in Videos: A Comparison Study of Anomaly Detection and Action Recognition Weijun Tan Jingfeng Liu 71 8 0 23 May 2022
Deep Learning for Visual Speech Analysis: A Survey Changchong Sheng Gangyao Kuang L. Bai Chen Hou Y. Guo Xin Xu M. Pietikäinen Li Liu VLM 98 36 0 22 May 2022
GL-RG: Global-Local Representation Granularity for Video Captioning Liqi Yan Qifan Wang Yiming Cui Fuli Feng Xiaojun Quan Xinming Zhang Dongfang Liu 125 59 0 22 May 2022
Structured Attention Composition for Temporal Action Localization Le Yang Junwei Han Tao Zhao Nian Liu Dingwen Zhang 84 17 0 20 May 2022
Cross-Enhancement Transformer for Action Segmentation Jiahui Wang Zhenyou Wang Shanna Zhuang Hui Wang ViT 97 23 0 19 May 2022
PYSKL: Towards Good Practices for Skeleton Action Recognition Haodong Duan Jiaqi Wang Kai-xiang Chen Dahua Lin VLM 88 147 0 19 May 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval Max Bain Arsha Nagrani Gül Varol Andrew Zisserman CLIP 202 62 0 17 May 2022
Learnable Optimal Sequential Grouping for Video Scene Detection Daniel Rotman Yevgeny Yaroker Elad Amrani Udi Barzelay Rami Ben-Ari 35 10 0 17 May 2022
ETAD: Training Action Detection End to End on a Laptop Shuming Liu Mengmeng Xu Chen Zhao Xu Zhao Guohao Li 78 7 0 14 May 2022
Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild Fuyan Ma Bin Sun Shutao Li ViT 57 31 0 10 May 2022
Scaling up sign spotting through sign language dictionaries Gül Varol Liliane Momeni Samuel Albanie Triantafyllos Afouras Andrew Zisserman 71 15 0 09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao Teli Ma Hongsheng Li Ziyi Lin Jifeng Dai Yu Qiao ViT 94 128 0 08 May 2022
Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study Liqun Lin Zheng Wang Jiachen He Weiling Chen Yiwen Xu Tiesong Zhao 84 7 0 07 May 2022
Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement Bing Li Jiaxin Chen Dongming Zhang Xiuguo Bao Di Huang 56 15 0 07 May 2022
An Empirical Study on Activity Recognition in Long Surgical Videos Zhuohong He A. Mottaghi Aidean Sharghi Muhammad Abdullah Jamal Omid Mohareri 90 12 0 05 May 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection Mingdong Yang Guo Chen Yin-Dong Zheng Tong Lu Limin Wang 100 48 0 05 May 2022
Deep Neural Network approaches for Analysing Videos of Music Performances F. Liwicki Richa Upadhyay Prakash Chandra Chhipa Killian Murphy F. Visi S. Östersjö Marcus Liwicki 64 1 0 05 May 2022
ANUBIS: Skeleton Action Recognition Dataset, Review, and Benchmark Zhenyue Qin Yang Liu Madhawa Perera Tom Gedeon Pan Ji Dongwoo Kim Saeed Anwar 60 4 0 04 May 2022
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition Haodong Duan Nanxuan Zhao Kai-xiang Chen Dahua Lin ViT AI4TS 82 19 0 04 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition Xianhang Li Huiyu Wang Chen Wei Jieru Mei Alan Yuille Yuyin Zhou Cihang Xie 74 0 0 03 May 2022
Cross-modal Representation Learning for Zero-shot Action Recognition Chung-Ching Lin Kevin Qinghong Lin Linjie Li Lijuan Wang Zicheng Liu ViT 62 29 0 03 May 2022
Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization Qinying Liu Zilei Wang Ruoxi Chen Zhilin Li 74 4 0 01 May 2022