ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXiv (abs)PDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 657 papers shown
Title
Busy-Quiet Video Disentangling for Video Classification
Busy-Quiet Video Disentangling for Video Classification
Guoxi Huang
A. Bors
56
7
0
29 Mar 2021
No frame left behind: Full Video Action Recognition
No frame left behind: Full Video Action Recognition
X. Liu
S. Pintea
Fatemeh Karimi Nejadasl
Olaf Booij
Jan van Gemert
85
41
0
29 Mar 2021
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text
  Retrieval
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
Song Liu
Haoqi Fan
Shengsheng Qian
Yiru Chen
Wenkui Ding
Zhongyuan Wang
106
147
0
28 Mar 2021
Catalyzing Clinical Diagnostic Pipelines Through Volumetric Medical
  Image Segmentation Using Deep Neural Networks: Past, Present, & Future
Catalyzing Clinical Diagnostic Pipelines Through Volumetric Medical Image Segmentation Using Deep Neural Networks: Past, Present, & Future
Teofilo E. Zosa
OOD
48
0
0
27 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
71
17
0
27 Mar 2021
Learning Comprehensive Motion Representation for Action Recognition
Learning Comprehensive Motion Representation for Action Recognition
Mingyu Wu
Boyuan Jiang
Donghao Luo
Junchi Yan
Yabiao Wang
Ying Tai
Chengjie Wang
Jilin Li
Feiyue Huang
Xiaokang Yang
58
12
0
23 Mar 2021
AdaSGN: Adapting Joint Number and Model Size for Efficient
  Skeleton-Based Action Recognition
AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action Recognition
Lei Shi
Yifan Zhang
Jian Cheng
Hanqing Lu
73
48
0
22 Mar 2021
Efficient Spatialtemporal Context Modeling for Action Recognition
Efficient Spatialtemporal Context Modeling for Action Recognition
Congqi Cao
Yue Lu
Yifan Zhang
Dengyang Jiang
Yanning Zhang
81
4
0
20 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
79
133
0
19 Mar 2021
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex
  Action Recognition
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition
Pengzhen Ren
Gang Xiao
Xiaojun Chang
Yun Xiao
Zhihui Li
Xiaojiang Chen
ViT
74
4
0
17 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal
  Tasks with Language and Vision
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
140
39
0
06 Mar 2021
Unsupervised Motion Representation Enhanced Network for Action
  Recognition
Unsupervised Motion Representation Enhanced Network for Action Recognition
Xiaohang Yang
Lingtong Kong
Jie Yang
43
4
0
05 Mar 2021
VA-RED$^2$: Video Adaptive Redundancy Reduction
VA-RED2^22: Video Adaptive Redundancy Reduction
Bowen Pan
Yikang Shen
Camilo Luciano Fosco
Chung-Ching Lin
A. Andonian
Yue Meng
Kate Saenko
A. Oliva
Rogerio Feris
84
19
0
15 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
179
665
0
11 Feb 2021
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action
  Recognition
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
Yue Meng
Yikang Shen
Chung-Ching Lin
P. Sattigeri
Leonid Karlinsky
Kate Saenko
A. Oliva
Rogerio Feris
167
63
0
10 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
420
2,075
0
09 Feb 2021
Bridging the gap between Human Action Recognition and Online Action
  Detection
Bridging the gap between Human Action Recognition and Online Action Detection
Alban Main De Boissiere
R. Noumeir
97
0
0
21 Jan 2021
Few-shot Action Recognition with Prototype-centered Attentive Learning
Few-shot Action Recognition with Prototype-centered Attentive Learning
Xiatian Zhu
Antoine Toisoul
Juan-Manuel Prez-Ra
Li Zhang
Brais Martínez
Tao Xiang
91
53
0
20 Jan 2021
TCLR: Temporal Contrastive Learning for Video Representation
TCLR: Temporal Contrastive Learning for Video Representation
I. Dave
Rohit Gupta
Mamshad Nayeem Rizve
Mubarak Shah
SSLAI4TS
121
180
0
20 Jan 2021
3D-ANAS: 3D Asymmetric Neural Architecture Search for Fast Hyperspectral
  Image Classification
3D-ANAS: 3D Asymmetric Neural Architecture Search for Fast Hyperspectral Image Classification
Haokui Zhang
Chengrong Gong
Yunpeng Bai
Zongwen Bai
Ying Li
57
27
0
12 Jan 2021
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Kunpeng Li
Zizhao Zhang
Guanhang Wu
Xuehan Xiong
Chen-Yu Lee
Zhichao Lu
Y. Fu
Tomas Pfister
78
5
0
11 Jan 2021
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video
  Recognition
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition
Hengduo Li
Zuxuan Wu
Abhinav Shrivastava
L. Davis
73
35
0
29 Dec 2020
Global Context Networks
Global Context Networks
Yue Cao
Jiarui Xu
Stephen Lin
Fangyun Wei
Han Hu
ISeg
117
99
0
24 Dec 2020
Human Action Recognition from Various Data Modalities: A Review
Human Action Recognition from Various Data Modalities: A Review
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
170
534
0
22 Dec 2020
TDN: Temporal Difference Networks for Efficient Action Recognition
TDN: Temporal Difference Networks for Efficient Action Recognition
Limin Wang
Zhan Tong
Bin Ji
Gangshan Wu
138
401
0
18 Dec 2020
Multi-shot Temporal Event Localization: a Benchmark
Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu
Yao Hu
S. Bai
Fei Ding
X. Bai
Philip Torr
114
84
0
17 Dec 2020
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
Tarun Kalluri
Deepak Pathak
Manmohan Chandraker
Du Tran
VGen
89
148
0
15 Dec 2020
GTA: Global Temporal Attention for Video Action Understanding
GTA: Global Temporal Attention for Video Action Understanding
Bo He
Xitong Yang
Zuxuan Wu
Hao Chen
Ser-Nam Lim
Abhinav Shrivastava
ViT
93
28
0
15 Dec 2020
NUTA: Non-uniform Temporal Aggregation for Action Recognition
NUTA: Non-uniform Temporal Aggregation for Action Recognition
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Hao Chen
Joseph Tighe
ViT
53
16
0
15 Dec 2020
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLMAI4TS
129
188
0
11 Dec 2020
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency
  Prediction
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
Samyak Jain
P. Yarlagadda
Shreyank Jyoti
Shyamgopal Karthik
Subramanian Ramanathan
Vineet Gandhi
ViT
91
69
0
11 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
99
67
0
10 Dec 2020
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization
  for Efficient Video Classification
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Youngwan Lee
Hyungil Kim
Kimin Yun
Jinyoung Moon
51
12
0
01 Dec 2020
Recent Progress in Appearance-based Action Recognition
Recent Progress in Appearance-based Action Recognition
J. Humphreys
Zhe Chen
Dacheng Tao
55
0
0
25 Nov 2020
A3D: Adaptive 3D Networks for Video Action Recognition
A3D: Adaptive 3D Networks for Video Action Recognition
Sijie Zhu
Taojiannan Yang
Matías Mendieta
Chong Chen
3DH
70
13
0
24 Nov 2020
Play Fair: Frame Attributions in Video Models
Play Fair: Frame Attributions in Video Models
Will Price
Dima Damen
FAtt
55
5
0
24 Nov 2020
QuerYD: A video dataset with high-quality text and audio narrations
QuerYD: A video dataset with high-quality text and audio narrations
Andreea-Maria Oncescu
João F. Henriques
Yang Liu
Andrew Zisserman
Samuel Albanie
VGen
76
11
0
22 Nov 2020
We don't Need Thousand Proposals$\colon$ Single Shot Actor-Action
  Detection in Videos
We don't Need Thousand Proposals ⁣:\colon: Single Shot Actor-Action Detection in Videos
A. J. Rana
Yogesh S Rawat
ViT
44
11
0
22 Nov 2020
3D CNNs with Adaptive Temporal Feature Resolutions
3D CNNs with Adaptive Temporal Feature Resolutions
Mohsen Fayyaz
Emad Bahrami Rad
Ali Diba
M. Noroozi
Ehsan Adeli
Luc Van Gool
Juergen Gall
3DPC
69
31
0
17 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
134
423
0
14 Nov 2020
Multimodal Pretraining for Dense Video Captioning
Multimodal Pretraining for Dense Video Captioning
Gabriel Huang
Bo Pang
Zhenhai Zhu
Clara E. Rivera
Radu Soricut
96
87
0
10 Nov 2020
Temporal Stochastic Softmax for 3D CNNs: An Application in Facial
  Expression Recognition
Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition
T. Ayral
M. Pedersoli
Simon L Bacon
Eric Granger
CVBM3DH
53
11
0
10 Nov 2020
Mutual Modality Learning for Video Action Classification
Mutual Modality Learning for Video Action Classification
Stepan Alekseevich Komkov
Maksim Dzabraev
Aleksandr Petiushko
62
9
0
04 Nov 2020
PV-NAS: Practical Neural Architecture Search for Video Recognition
PV-NAS: Practical Neural Architecture Search for Video Recognition
Zihao Wang
Chen Lin
Lu Sheng
Junjie Yan
Jing Shao
ViT
77
7
0
02 Nov 2020
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised
  Video Representation Leaning
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning
L. Tao
Xueting Wang
T. Yamasaki
VLMSSL
104
14
0
29 Oct 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action
  Recognition
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Chun-Fu Chen
Yikang Shen
K. Ramakrishnan
Rogerio Feris
J. M. Cohn
A. Oliva
Quanfu Fan
114
99
0
22 Oct 2020
Pose And Joint-Aware Action Recognition
Pose And Joint-Aware Action Recognition
Anshul B. Shah
Shlok Kumar Mishra
Ankan Bansal
Jun-Cheng Chen
Ramalingam Chellappa
Abhinav Shrivastava
137
33
0
16 Oct 2020
Back to the Future: Cycle Encoding Prediction for Self-supervised
  Contrastive Video Representation Learning
Back to the Future: Cycle Encoding Prediction for Self-supervised Contrastive Video Representation Learning
Xinyu Yang
Majid Mirmehdi
T. Burghardt
83
4
0
14 Oct 2020
Boosting Continuous Sign Language Recognition via Cross Modality
  Augmentation
Boosting Continuous Sign Language Recognition via Cross Modality Augmentation
Junfu Pu
Wen-gang Zhou
Hezhen Hu
Houqiang Li
99
114
0
11 Oct 2020
Contrastive Representation Learning: A Framework and Review
Contrastive Representation Learning: A Framework and Review
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSLAI4TS
326
720
0
10 Oct 2020
Previous
123...10111213149
Next