Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.13605
Cited By
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
24 October 2022
Samrudhdhi B. Rangrej
Kevin J. Liang
Tal Hassner
James J. Clark
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction"
42 / 42 papers shown
Title
Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling
Khoi-Nguyen C. Mac
Minh Do
Minh Vo
TTA
53
1
0
12 Jul 2022
Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes
Samrudhdhi B. Rangrej
C. Srinidhi
J. Clark
53
12
0
01 Apr 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
201
1,181
0
23 Mar 2022
Glance and Focus Networks for Dynamic Visual Recognition
Gao Huang
Yulin Wang
Kangchen Lv
Haojun Jiang
Wenhui Huang
Pengfei Qi
S. Song
3DH
109
50
0
09 Jan 2022
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
Yulin Wang
Yang Yue
Yuanze Lin
Haojun Jiang
Zihang Lai
V. Kulikov
Nikita Orlov
Humphrey Shi
Gao Huang
53
50
0
28 Dec 2021
A Probabilistic Hard Attention Model For Sequentially Observed Scenes
Samrudhdhi B. Rangrej
James J. Clark
44
12
0
15 Nov 2021
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
94
1,474
0
24 Jun 2021
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
53
210
0
03 Jun 2021
Anticipating human actions by correlating past with the future with Jaccard similarity measures
Basura Fernando
Samitha Herath
EgoV
56
58
0
26 May 2021
Adaptive Focus for Efficient Video Recognition
Yulin Wang
Zhaoxi Chen
Haojun Jiang
Shiji Song
Yizeng Han
Gao Huang
64
99
0
07 May 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
201
2,137
0
29 Mar 2021
Hard-Attention for Scalable Image Classification
Athanasios Papadopoulos
Pawel Korus
N. Memon
87
25
0
20 Feb 2021
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
359
6,731
0
23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
550
40,739
0
22 Oct 2020
Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
Yulin Wang
Kangchen Lv
Rui Huang
Shiji Song
Le Yang
Gao Huang
3DH
40
150
0
11 Oct 2020
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
125
1,018
0
09 Apr 2020
Meta Pseudo Labels
Hieu H. Pham
Zihang Dai
Qizhe Xie
Minh-Thang Luong
Quoc V. Le
VLM
335
667
0
23 Mar 2020
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn
David Berthelot
Chun-Liang Li
Zizhao Zhang
Nicholas Carlini
E. D. Cubuk
Alexey Kurakin
Han Zhang
Colin Raffel
AAML
153
3,545
0
21 Jan 2020
Self-training with Noisy Student improves ImageNet classification
Qizhe Xie
Minh-Thang Luong
Eduard H. Hovy
Quoc V. Le
NoLa
296
2,387
0
11 Nov 2019
Knowledge Distillation from Internal Representations
Gustavo Aguilar
Yuan Ling
Yu Zhang
Benjamin Yao
Xing Fan
Edward Guo
70
181
0
08 Oct 2019
Saccader: Improving Accuracy of Hard Attention Models for Vision
Gamaleldin F. Elsayed
Simon Kornblith
Quoc V. Le
VLM
42
73
0
20 Aug 2019
Unsupervised Data Augmentation for Consistency Training
Qizhe Xie
Zihang Dai
Eduard H. Hovy
Minh-Thang Luong
Quoc V. Le
124
2,314
0
29 Apr 2019
Video Classification with Channel-Separated Convolutional Networks
Du Tran
Heng Wang
Lorenzo Torresani
Matt Feiszli
3DV
61
586
0
04 Apr 2019
Cross-lingual Language Model Pretraining
Guillaume Lample
Alexis Conneau
73
2,735
0
22 Jan 2019
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
162
3,262
0
10 Dec 2018
TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin
Chuang Gan
Song Han
85
1,683
0
20 Nov 2018
Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points
Fabien Baradel
Christian Wolf
J. Mille
Graham W. Taylor
139
154
0
22 Feb 2018
Human Action Recognition: Pose-based Attention draws focus to Hands
Fabien Baradel
Christian Wolf
J. Mille
130
108
0
20 Dec 2017
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
196
3,021
0
30 Nov 2017
Temporal Relational Reasoning in Videos
Bolei Zhou
A. Andonian
Aude Oliva
Antonio Torralba
NAI
91
1,037
0
22 Nov 2017
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
273
8,888
0
21 Nov 2017
The "something something" video database for learning and evaluating visual common sense
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
82
1,529
0
13 Jun 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
651
130,942
0
12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
219
7,989
0
22 May 2017
Temporal Ensembling for Semi-Supervised Learning
S. Laine
Timo Aila
UQCV
181
2,552
0
07 Oct 2016
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
98
3,825
0
02 Aug 2016
Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning
Mehdi S. M. Sajjadi
Mehran Javanmardi
Tolga Tasdizen
BDL
80
1,111
0
14 Jun 2016
Spatial Transformer Networks
Max Jaderberg
Karen Simonyan
Andrew Zisserman
Koray Kavukcuoglu
292
7,379
0
05 Jun 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
322
19,609
0
09 Mar 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
324
10,050
0
10 Feb 2015
Learning with Pseudo-Ensembles
Philip Bachman
O. Alsharif
Doina Precup
70
598
0
16 Dec 2014
Recurrent Models of Visual Attention
Volodymyr Mnih
N. Heess
Alex Graves
Koray Kavukcuoglu
VLM
142
3,651
0
24 Jun 2014
1