Is Space-Time Attention All You Need for Video Understanding?

v1v2v3v4 (latest)

Is Space-Time Attention All You Need for Video Understanding?

9 February 2021

Gedas Bertasius

Heng Wang

Lorenzo Torresani

ArXiv (abs)PDF HTML Github (1694★)

Papers citing "Is Space-Time Attention All You Need for Video Understanding?"

8 / 108 papers shown

Title
The "something something" video database for learning and evaluating visual common sense Raghav Goyal Samira Ebrahimi Kahou Vincent Michalski Joanna Materzynska S. Westphal ... Moritz Mueller-Freitag F. Hoppe Christian Thurau Ingo Bax Roland Memisevic VLM 87 1,535 0 13 Jun 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 701 131,652 0 12 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 126 3,681 0 08 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 232 8,019 0 22 May 2017
ActionVLAD: Learning spatio-temporal aggregation for action classification Rohit Girdhar Deva Ramanan Abhinav Gupta Josef Sivic Bryan C. Russell AI4TS 72 451 0 10 Apr 2017
Layer Normalization Jimmy Lei Ba J. Kiros Geoffrey E. Hinton 413 10,494 0 21 Jul 2016
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 468 43,658 0 17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 1.6K 100,386 0 04 Sep 2014