v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 657 papers shown

Title
Busy-Quiet Video Disentangling for Video Classification Guoxi Huang A. Bors 56 7 0 29 Mar 2021
No frame left behind: Full Video Action Recognition X. Liu S. Pintea Fatemeh Karimi Nejadasl Olaf Booij Jan van Gemert 85 41 0 29 Mar 2021
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval Song Liu Haoqi Fan Shengsheng Qian Yiru Chen Wenkui Ding Zhongyuan Wang 106 147 0 28 Mar 2021
Catalyzing Clinical Diagnostic Pipelines Through Volumetric Medical Image Segmentation Using Deep Neural Networks: Past, Present, & Future Teofilo E. Zosa OOD 48 0 0 27 Mar 2021
A Comprehensive Review of the Video-to-Text Problem Jesus Perez-Martin B. Bustos S. Guimarães I. Sipiran Jorge A. Pérez Grethel Coello Said 71 17 0 27 Mar 2021
Learning Comprehensive Motion Representation for Action Recognition Mingyu Wu Boyuan Jiang Donghao Luo Junchi Yan Yabiao Wang Ying Tai Chengjie Wang Jilin Li Feiyue Huang Xiaokang Yang 58 12 0 23 Mar 2021
AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action Recognition Lei Shi Yifan Zhang Jian Cheng Hanqing Lu 73 48 0 22 Mar 2021
Efficient Spatialtemporal Context Modeling for Action Recognition Congqi Cao Yue Lu Yifan Zhang Dengyang Jiang Yanning Zhang 81 4 0 20 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval Maksim Dzabraev M. Kalashnikov Stepan Alekseevich Komkov Aleksandr Petiushko 79 133 0 19 Mar 2021
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition Pengzhen Ren Gang Xiao Xiaojun Chang Yun Xiao Zhihui Li Xiaojiang Chen ViT 74 4 0 17 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision Andrew Shin Masato Ishii T. Narihira 140 39 0 06 Mar 2021
Unsupervised Motion Representation Enhanced Network for Action Recognition Xiaohang Yang Lingtong Kong Jie Yang 43 4 0 05 Mar 2021
VA-RED $^2$ : Video Adaptive Redundancy Reduction Bowen Pan Yikang Shen Camilo Luciano Fosco Chung-Ching Lin A. Andonian Yue Meng Kate Saenko A. Oliva Rogerio Feris 84 19 0 15 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling Jie Lei Linjie Li Luowei Zhou Zhe Gan Tamara L. Berg Joey Tianyi Zhou Jingjing Liu CLIP 179 665 0 11 Feb 2021
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition Yue Meng Yikang Shen Chung-Ching Lin P. Sattigeri Leonid Karlinsky Kate Saenko A. Oliva Rogerio Feris 167 63 0 10 Feb 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 420 2,075 0 09 Feb 2021
Bridging the gap between Human Action Recognition and Online Action Detection Alban Main De Boissiere R. Noumeir 97 0 0 21 Jan 2021
Few-shot Action Recognition with Prototype-centered Attentive Learning Xiatian Zhu Antoine Toisoul Juan-Manuel Prez-Ra Li Zhang Brais Martínez Tao Xiang 91 53 0 20 Jan 2021
TCLR: Temporal Contrastive Learning for Video Representation I. Dave Rohit Gupta Mamshad Nayeem Rizve Mubarak Shah SSL AI4TS 121 180 0 20 Jan 2021
3D-ANAS: 3D Asymmetric Neural Architecture Search for Fast Hyperspectral Image Classification Haokui Zhang Chengrong Gong Yunpeng Bai Zongwen Bai Ying Li 57 27 0 12 Jan 2021
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts Kunpeng Li Zizhao Zhang Guanhang Wu Xuehan Xiong Chen-Yu Lee Zhichao Lu Y. Fu Tomas Pfister 78 5 0 11 Jan 2021
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition Hengduo Li Zuxuan Wu Abhinav Shrivastava L. Davis 73 35 0 29 Dec 2020
Global Context Networks Yue Cao Jiarui Xu Stephen Lin Fangyun Wei Han Hu ISeg 117 99 0 24 Dec 2020
Human Action Recognition from Various Data Modalities: A Review Zehua Sun Qiuhong Ke Hossein Rahmani Mohammed Bennamoun Gang Wang Jun Liu MU 170 534 0 22 Dec 2020
TDN: Temporal Difference Networks for Efficient Action Recognition Limin Wang Zhan Tong Bin Ji Gangshan Wu 138 401 0 18 Dec 2020
Multi-shot Temporal Event Localization: a Benchmark Xiaolong Liu Yao Hu S. Bai Fei Ding X. Bai Philip Torr 114 84 0 17 Dec 2020
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation Tarun Kalluri Deepak Pathak Manmohan Chandraker Du Tran VGen 89 148 0 15 Dec 2020
GTA: Global Temporal Attention for Video Action Understanding Bo He Xitong Yang Zuxuan Wu Hao Chen Ser-Nam Lim Abhinav Shrivastava ViT 93 28 0 15 Dec 2020
NUTA: Non-uniform Temporal Aggregation for Action Recognition Xinyu Li Chunhui Liu Bing Shuai Yi Zhu Hao Chen Joseph Tighe ViT 53 16 0 15 Dec 2020
A Comprehensive Study of Deep Video Action Recognition Yi Zhu Xinyu Li Chunhui Liu Mohammadreza Zolfaghari Yuanjun Xiong Chongruo Wu Zhi-Li Zhang Joseph Tighe R. Manmatha Mu Li VLM AI4TS 129 188 0 11 Dec 2020
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction Samyak Jain P. Yarlagadda Shreyank Jyoti Shyamgopal Karthik Subramanian Ramanathan Vineet Gandhi ViT 91 69 0 11 Dec 2020
Look Before you Speak: Visually Contextualized Utterances Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid 99 67 0 10 Dec 2020
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification Youngwan Lee Hyungil Kim Kimin Yun Jinyoung Moon 51 12 0 01 Dec 2020
Recent Progress in Appearance-based Action Recognition J. Humphreys Zhe Chen Dacheng Tao 55 0 0 25 Nov 2020
A3D: Adaptive 3D Networks for Video Action Recognition Sijie Zhu Taojiannan Yang Matías Mendieta Chong Chen 3DH 70 13 0 24 Nov 2020
Play Fair: Frame Attributions in Video Models Will Price Dima Damen FAtt 55 5 0 24 Nov 2020
QuerYD: A video dataset with high-quality text and audio narrations Andreea-Maria Oncescu João F. Henriques Yang Liu Andrew Zisserman Samuel Albanie VGen 76 11 0 22 Nov 2020
$We don't Need Thousand Proposals$\colon$ Single Shot Actor-Action Detection in Videos$ We don't Need Thousand Proposals $\colon$ Single Shot Actor-Action Detection in Videos A. J. Rana Yogesh S Rawat ViT 44 11 0 22 Nov 2020
3D CNNs with Adaptive Temporal Feature Resolutions Mohsen Fayyaz Emad Bahrami Rad Ali Diba M. Noroozi Ehsan Adeli Luc Van Gool Juergen Gall 3DPC 69 31 0 17 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations Linchao Zhu Yi Yang ViT 134 423 0 14 Nov 2020
Multimodal Pretraining for Dense Video Captioning Gabriel Huang Bo Pang Zhenhai Zhu Clara E. Rivera Radu Soricut 96 87 0 10 Nov 2020
Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition T. Ayral M. Pedersoli Simon L Bacon Eric Granger CVBM 3DH 53 11 0 10 Nov 2020
Mutual Modality Learning for Video Action Classification Stepan Alekseevich Komkov Maksim Dzabraev Aleksandr Petiushko 62 9 0 04 Nov 2020
PV-NAS: Practical Neural Architecture Search for Video Recognition Zihao Wang Chen Lin Lu Sheng Junjie Yan Jing Shao ViT 77 7 0 02 Nov 2020
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning L. Tao Xueting Wang T. Yamasaki VLM SSL 104 14 0 29 Oct 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition Chun-Fu Chen Yikang Shen K. Ramakrishnan Rogerio Feris J. M. Cohn A. Oliva Quanfu Fan 114 99 0 22 Oct 2020
Pose And Joint-Aware Action Recognition Anshul B. Shah Shlok Kumar Mishra Ankan Bansal Jun-Cheng Chen Ramalingam Chellappa Abhinav Shrivastava 137 33 0 16 Oct 2020
Back to the Future: Cycle Encoding Prediction for Self-supervised Contrastive Video Representation Learning Xinyu Yang Majid Mirmehdi T. Burghardt 83 4 0 14 Oct 2020
Boosting Continuous Sign Language Recognition via Cross Modality Augmentation Junfu Pu Wen-gang Zhou Hezhen Hu Houqiang Li 99 114 0 11 Oct 2020
Contrastive Representation Learning: A Framework and Review Phúc H. Lê Khắc Graham Healy Alan F. Smeaton SSL AI4TS 326 720 0 10 Oct 2020