Learning to track for spatio-temporal action localization

v1v2 (latest)

Learning to track for spatio-temporal action localization

5 June 2015

Philippe Weinzaepfel

Zaïd Harchaoui

Cordelia Schmid

ArXiv (abs)PDF HTML

Papers citing "Learning to track for spatio-temporal action localization"

15 / 15 papers shown

Title
Action tube generation by person query matching for spatio-temporal action detection Kazuki Omi Jion Oshima Toru Tamaki 136 0 0 17 Mar 2025
A Novel Online Action Detection Framework from Untrimmed Video Streams Da-Hye Yoon Nam-Gyu Cho Seong-Whan Lee 73 20 0 17 Mar 2020
Discovering Spatio-Temporal Action Tubes Yuancheng Ye Xiaodong Yang Yingli Tian 65 14 0 29 Nov 2018
Action Tubelet Detector for Spatio-Temporal Action Localization Vicky Kalogeiton Philippe Weinzaepfel V. Ferrari Cordelia Schmid 68 325 0 04 May 2017
DAP3D-Net: Where, What and How Actions Occur in Videos? Li Liu Yi Zhou Ling Shao 57 14 0 10 Feb 2016
A robust and efficient video representation for action recognition Heng Wang Dan Oneaţă Jakob Verbeek Cordelia Schmid 58 326 0 21 Apr 2015
What makes for effective detection proposals? J. Hosang Rodrigo Benenson Piotr Dollár Bernt Schiele ObjD 110 735 0 17 Feb 2015
Learning Spatiotemporal Features with 3D Convolutional Networks Du Tran Lubomir D. Bourdev Rob Fergus Lorenzo Torresani Manohar Paluri 3DPC 77 411 0 02 Dec 2014
Finding Action Tubes Georgia Gkioxari Jitendra Malik 77 599 0 21 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 173 6,057 0 17 Nov 2014
Caffe: Convolutional Architecture for Fast Feature Embedding Yangqing Jia Evan Shelhamer Jeff Donahue Sergey Karayev Jonathan Long Ross B. Girshick S. Guadarrama Trevor Darrell VLM BDL 3DV 280 14,715 0 20 Jun 2014
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman 259 7,545 0 09 Jun 2014
Rich feature hierarchies for accurate object detection and semantic segmentation Ross B. Girshick Jeff Donahue Trevor Darrell Jitendra Malik ObjD 291 26,223 0 11 Nov 2013
Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks Alessandro Giusti D. Ciresan Jonathan Masci L. Gambardella Jürgen Schmidhuber 178 346 0 07 Feb 2013
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild K. Soomro Amir Zamir M. Shah CLIP VGen 160 6,164 0 03 Dec 2012