GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction

24 October 2022

Samrudhdhi B. Rangrej

Papers citing "GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction"

42 / 42 papers shown

Title
Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling Khoi-Nguyen C. Mac Minh Do Minh Vo TTA 53 1 0 12 Jul 2022
Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes Samrudhdhi B. Rangrej C. Srinidhi J. Clark 53 12 0 01 Apr 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training Zhan Tong Yibing Song Jue Wang Limin Wang ViT 201 1,181 0 23 Mar 2022
Glance and Focus Networks for Dynamic Visual Recognition Gao Huang Yulin Wang Kangchen Lv Haojun Jiang Wenhui Huang Pengfei Qi S. Song 3DH 109 50 0 09 Jan 2022
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition Yulin Wang Yang Yue Yuanze Lin Haojun Jiang Zihang Lai V. Kulikov Nikita Orlov Humphrey Shi Gao Huang 53 50 0 28 Dec 2021
A Probabilistic Hard Attention Model For Sequentially Observed Scenes Samrudhdhi B. Rangrej James J. Clark 44 12 0 15 Nov 2021
Video Swin Transformer Ze Liu Jia Ning Yue Cao Yixuan Wei Zheng Zhang Stephen Lin Han Hu ViT 94 1,474 0 24 Jun 2021
Anticipative Video Transformer Rohit Girdhar Kristen Grauman ViT 53 210 0 03 Jun 2021
Anticipating human actions by correlating past with the future with Jaccard similarity measures Basura Fernando Samitha Herath EgoV 56 58 0 26 May 2021
Adaptive Focus for Efficient Video Recognition Yulin Wang Zhaoxi Chen Haojun Jiang Shiji Song Yizeng Han Gao Huang 64 99 0 07 May 2021
ViViT: A Video Vision Transformer Anurag Arnab Mostafa Dehghani G. Heigold Chen Sun Mario Lucic Cordelia Schmid ViT 201 2,137 0 29 Mar 2021
Hard-Attention for Scalable Image Classification Athanasios Papadopoulos Pawel Korus N. Memon 87 25 0 20 Feb 2021
Training data-efficient image transformers & distillation through attention Hugo Touvron Matthieu Cord Matthijs Douze Francisco Massa Alexandre Sablayrolles Hervé Jégou ViT 359 6,731 0 23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 550 40,739 0 22 Oct 2020
Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification Yulin Wang Kangchen Lv Rui Huang Shiji Song Le Yang Gao Huang 3DH 40 150 0 11 Oct 2020
X3D: Expanding Architectures for Efficient Video Recognition Christoph Feichtenhofer 125 1,018 0 09 Apr 2020
Meta Pseudo Labels Hieu H. Pham Zihang Dai Qizhe Xie Minh-Thang Luong Quoc V. Le VLM 335 667 0 23 Mar 2020
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence Kihyuk Sohn David Berthelot Chun-Liang Li Zizhao Zhang Nicholas Carlini E. D. Cubuk Alexey Kurakin Han Zhang Colin Raffel AAML 153 3,545 0 21 Jan 2020
Self-training with Noisy Student improves ImageNet classification Qizhe Xie Minh-Thang Luong Eduard H. Hovy Quoc V. Le NoLa 296 2,387 0 11 Nov 2019
Knowledge Distillation from Internal Representations Gustavo Aguilar Yuan Ling Yu Zhang Benjamin Yao Xing Fan Edward Guo 70 181 0 08 Oct 2019
Saccader: Improving Accuracy of Hard Attention Models for Vision Gamaleldin F. Elsayed Simon Kornblith Quoc V. Le VLM 42 73 0 20 Aug 2019
Unsupervised Data Augmentation for Consistency Training Qizhe Xie Zihang Dai Eduard H. Hovy Minh-Thang Luong Quoc V. Le 124 2,314 0 29 Apr 2019
Video Classification with Channel-Separated Convolutional Networks Du Tran Heng Wang Lorenzo Torresani Matt Feiszli 3DV 61 586 0 04 Apr 2019
Cross-lingual Language Model Pretraining Guillaume Lample Alexis Conneau 73 2,735 0 22 Jan 2019
SlowFast Networks for Video Recognition Christoph Feichtenhofer Haoqi Fan Jitendra Malik Kaiming He 162 3,262 0 10 Dec 2018
TSM: Temporal Shift Module for Efficient Video Understanding Ji Lin Chuang Gan Song Han 85 1,683 0 20 Nov 2018
Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points Fabien Baradel Christian Wolf J. Mille Graham W. Taylor 139 154 0 22 Feb 2018
Human Action Recognition: Pose-based Attention draws focus to Hands Fabien Baradel Christian Wolf J. Mille 130 108 0 20 Dec 2017
A Closer Look at Spatiotemporal Convolutions for Action Recognition Du Tran Heng Wang Lorenzo Torresani Jamie Ray Yann LeCun Manohar Paluri 196 3,021 0 30 Nov 2017
Temporal Relational Reasoning in Videos Bolei Zhou A. Andonian Aude Oliva Antonio Torralba NAI 91 1,037 0 22 Nov 2017
Non-local Neural Networks Xinyu Wang Ross B. Girshick Abhinav Gupta Kaiming He OffRL 273 8,888 0 21 Nov 2017
The "something something" video database for learning and evaluating visual common sense Raghav Goyal Samira Ebrahimi Kahou Vincent Michalski Joanna Materzynska S. Westphal ... Moritz Mueller-Freitag F. Hoppe Christian Thurau Ingo Bax Roland Memisevic VLM 82 1,529 0 13 Jun 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 651 130,942 0 12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 219 7,989 0 22 May 2017
Temporal Ensembling for Semi-Supervised Learning S. Laine Timo Aila UQCV 181 2,552 0 07 Oct 2016
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao Dahua Lin Xiaoou Tang Luc Van Gool ViT 98 3,825 0 02 Aug 2016
Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning Mehdi S. M. Sajjadi Mehran Javanmardi Tolga Tasdizen BDL 80 1,111 0 14 Jun 2016
Spatial Transformer Networks Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu 292 7,379 0 05 Jun 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 322 19,609 0 09 Mar 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 324 10,050 0 10 Feb 2015
Learning with Pseudo-Ensembles Philip Bachman O. Alsharif Doina Precup 70 598 0 16 Dec 2014
Recurrent Models of Visual Attention Volodymyr Mnih N. Heess Alex Graves Koray Kavukcuoglu VLM 142 3,651 0 24 Jun 2014