v1v2v3 (latest)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 3,647 papers shown

Title
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering Aman Chadha Gurneet Arora Navpreet Kaloty 66 37 0 16 Nov 2020
Direct Classification of Emotional Intensity Jacob Ouyang I. Galatzer-Levy V. Koesmahargyo Li Zhang 27 0 0 15 Nov 2020
SALAD: Self-Assessment Learning for Action Detection Guillaume Vaudaux-Ruth Adrien Chan-Hon-Tong Catherine Achard 44 8 0 13 Nov 2020
Universal Embeddings for Spatio-Temporal Tagging of Self-Driving Logs Sean Segal Eric Kee Wenjie Luo Abbas Sadat Ersin Yumer R. Urtasun 42 11 0 12 Nov 2020
Multimodal Pretraining for Dense Video Captioning Gabriel Huang Bo Pang Zhenhai Zhu Clara E. Rivera Radu Soricut 96 87 0 10 Nov 2020
Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos Di Yang Rui Dai Yaohui Wang Rupayan Mallick Luca Minciullo Gianpiero Francesca Francois Bremond 81 16 0 10 Nov 2020
Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition T. Ayral M. Pedersoli Simon L Bacon Eric Granger CVBM 3DH 53 11 0 10 Nov 2020
STCNet: Spatio-Temporal Cross Network for Industrial Smoke Detection Yichao Cao Qingfei Tang Xiaobo Lu Fan Li Jinde Cao 27 3 0 10 Nov 2020
Multi-Temporal Convolutions for Human Action Recognition in Videos Alexandros Stergiou R. Poppe 75 1 0 08 Nov 2020
Integrating Human Gaze into Attention for Egocentric Activity Recognition Kyle Min Jason J. Corso 68 43 0 08 Nov 2020
AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection Hao Zhu Chaoyou Fu Qianyi Wu Wayne Wu Chao Qian Ran He 74 32 0 05 Nov 2020
Mutual Modality Learning for Video Action Classification Stepan Alekseevich Komkov Maksim Dzabraev Aleksandr Petiushko 65 9 0 04 Nov 2020
S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-shot Segmentation Yuan Cheng Yuchao Yang Hai-Bao Chen Ngai Wong Hao Yu 3DPC 51 3 0 04 Nov 2020
Content-based Analysis of the Cultural Differences between TikTok and Douyin Li-yao Sun Haoqi Zhang Songyang Zhang Jiebo Luo 44 24 0 03 Nov 2020
PV-NAS: Practical Neural Architecture Search for Video Recognition Zihao Wang Chen Lin Lu Sheng Junjie Yan Jing Shao ViT 77 7 0 02 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Simon Ging Mohammadreza Zolfaghari Hamed Pirsiavash Thomas Brox ViT CLIP 81 174 0 01 Nov 2020
Efficient Pipelines for Vision-Based Context Sensing Xiaochen Liu 103 0 0 01 Nov 2020
A Survey on Contrastive Self-supervised Learning Ashish Jaiswal Ashwin Ramesh Babu Mohammad Zaki Zadeh Debapriya Banerjee F. Makedon SSL 159 1,415 0 31 Oct 2020
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning L. Tao Xueting Wang T. Yamasaki VLM SSL 104 14 0 29 Oct 2020
Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection Rui Dai Srijan Das Saurav Sharma Luca Minciullo Lorenzo Garattoni Francois Bremond Gianpiero Francesca 69 53 0 28 Oct 2020
ElderSim: A Synthetic Data Generation Platform for Human Action Recognition in Eldercare Applications Hochul Hwang Cheongjae Jang Geonwoo Park Junghyun Cho Ig-Jae Kim 113 73 0 28 Oct 2020
Deep DA for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labeled Videos G. Praveen Eric Granger P. Cardinal 33 2 0 28 Oct 2020
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning Peihao Chen Deng Huang Dongliang He Xiang Long Runhao Zeng Shilei Wen Mingkui Tan Chuang Gan SSL 73 134 0 27 Oct 2020
Spatio-temporal Features for Generalized Detection of Deepfake Videos Ipek Ganiyusufoglu L. Ngô N. Savov Sezer Karaoglu Theo Gevers 59 42 0 22 Oct 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition Chun-Fu Chen Yikang Shen K. Ramakrishnan Rogerio Feris J. M. Cohn A. Oliva Quanfu Fan 114 99 0 22 Oct 2020
Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization Yuanhao Zhai Le Wang Wei Tang Qilin Zhang Junsong Yuan G. Hua 104 139 0 22 Oct 2020
A Short Note on the Kinetics-700-2020 Human Action Dataset Lucas Smaira João Carreira Eric Noland Ellen Clancy Amy Wu Andrew Zisserman 91 139 0 21 Oct 2020
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog Wubo Li Dongwei Jiang Wei Zou Xiangang Li 47 6 0 21 Oct 2020
AttendAffectNet: Self-Attention based Networks for Predicting Affective Responses from Movies Ha Thi Phuong Thao Balamurali B.T. Dorien Herremans Gemma Roig 45 7 0 21 Oct 2020
Pedestrian Intention Prediction: A Multi-task Perspective Smail Ait Bouhsain Saeed Saadatnejad Alexandre Alahi 91 28 0 20 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues Hung Le Doyen Sahoo Nancy F. Chen Guosheng Lin 117 31 0 20 Oct 2020
Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition Yuqian Fu Li Zhang Junke Wang Yanwei Fu Yu-Gang Jiang 88 99 0 20 Oct 2020
Unsupervised Domain Adaptation for Spatio-Temporal Action Localization Nakul Agarwal Yi-Ting Chen Behzad Dariush Ming-Hsuan Yang 79 8 0 19 Oct 2020
Hierarchical Autoregressive Modeling for Neural Video Compression Ruihan Yang Yibo Yang Joseph Marino Stephan Mandt BDL VGen 185 47 0 19 Oct 2020
Temporal Binary Representation for Event-Based Action Recognition Simone Undri Innocenti Federico Becattini F. Pernici A. Bimbo 119 69 0 18 Oct 2020
Pose And Joint-Aware Action Recognition Anshul B. Shah Shlok Kumar Mishra Ankan Bansal Jun-Cheng Chen Ramalingam Chellappa Abhinav Shrivastava 141 33 0 16 Oct 2020
Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset Keshav Bhandari Mario A. DeLaGarza Ziliang Zong Hugo Latapie Yan Yan EgoV 90 6 0 15 Oct 2020
What is More Likely to Happen Next? Video-and-Language Future Event Prediction Jie Lei Licheng Yu Tamara L. Berg Joey Tianyi Zhou 104 73 0 15 Oct 2020
Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit Latent Features Myeongah Cho Taeoh Kim Woojin Kim Suhwan Cho Sangyoun Lee 98 95 0 15 Oct 2020
Pose Refinement Graph Convolutional Network for Skeleton-based Action Recognition Shijie Li Jinhui Yi Yazan Abu Farha Juergen Gall 50 36 0 14 Oct 2020
Back to the Future: Cycle Encoding Prediction for Self-supervised Contrastive Video Representation Learning Xinyu Yang Majid Mirmehdi T. Burghardt 96 4 0 14 Oct 2020
WeightAlign: Normalizing Activations by Weight Alignment Xiangwei Shi Yun-qiang Li X. Liu Jan van Gemert 41 0 0 14 Oct 2020
Video Action Understanding Matthew Hutchinson V. Gadepally 122 22 0 13 Oct 2020
DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video Cristian Rodriguez-Opazo Edison Marrese-Taylor Basura Fernando Hongdong Li Stephen Gould 195 10 0 13 Oct 2020
The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain Francesco Ragusa Antonino Furnari S. Livatino G. Farinella EgoV 63 102 0 12 Oct 2020
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation Dongxu Li Chenchen Xu Xin Yu Kaihao Zhang Ben Swift H. Suominen Hongdong Li SLR 60 124 0 12 Oct 2020
Boosting Continuous Sign Language Recognition via Cross Modality Augmentation Junfu Pu Wen-gang Zhou Hezhen Hu Houqiang Li 102 114 0 11 Oct 2020
Deep Sequence Learning for Video Anticipation: From Discrete and Deterministic to Continuous and Stochastic S. Aliakbarian AI4TS 37 0 0 09 Oct 2020
Watch, read and lookup: learning to spot signs from multiple supervisors Liliane Momeni Gül Varol Samuel Albanie Triantafyllos Afouras Andrew Zisserman 75 44 0 08 Oct 2020
Reconfigurable Cyber-Physical System for Lifestyle Video-Monitoring via Deep Learning Daniel Deniz Francisco Barranco J. Isern Eduardo Ros 26 9 0 07 Oct 2020