ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Z. Tu
Kevin Patrick Murphy
    3DH
ArXivPDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 650 papers shown
Title
Efficient Spatialtemporal Context Modeling for Action Recognition
Efficient Spatialtemporal Context Modeling for Action Recognition
Congqi Cao
Yue Lu
Yifan Zhang
D. Jiang
Yanning Zhang
29
4
0
20 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
24
128
0
19 Mar 2021
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex
  Action Recognition
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition
Pengzhen Ren
Gang Xiao
Xiaojun Chang
Yun Xiao
Zhihui Li
Xiaojiang Chen
ViT
26
4
0
17 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal
  Tasks with Language and Vision
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
35
37
0
06 Mar 2021
Unsupervised Motion Representation Enhanced Network for Action
  Recognition
Unsupervised Motion Representation Enhanced Network for Action Recognition
Xiaohang Yang
Lingtong Kong
Jie Yang
16
4
0
05 Mar 2021
VA-RED$^2$: Video Adaptive Redundancy Reduction
VA-RED2^22: Video Adaptive Redundancy Reduction
Bowen Pan
Yikang Shen
Camilo Luciano Fosco
Chung-Ching Lin
A. Andonian
Yue Meng
Kate Saenko
A. Oliva
Rogerio Feris
15
19
0
15 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
46
647
0
11 Feb 2021
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action
  Recognition
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
Yue Meng
Yikang Shen
Chung-Ching Lin
P. Sattigeri
Leonid Karlinsky
Kate Saenko
A. Oliva
Rogerio Feris
70
62
0
10 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,984
0
09 Feb 2021
Bridging the gap between Human Action Recognition and Online Action
  Detection
Bridging the gap between Human Action Recognition and Online Action Detection
Alban Main De Boissiere
R. Noumeir
22
0
0
21 Jan 2021
Few-shot Action Recognition with Prototype-centered Attentive Learning
Few-shot Action Recognition with Prototype-centered Attentive Learning
Xiatian Zhu
Antoine Toisoul
Juan-Manuel Prez-Ra
Li Zhang
Brais Martínez
Tao Xiang
39
53
0
20 Jan 2021
TCLR: Temporal Contrastive Learning for Video Representation
TCLR: Temporal Contrastive Learning for Video Representation
I. Dave
Rohit Gupta
Mamshad Nayeem Rizve
Mubarak Shah
SSL
AI4TS
34
175
0
20 Jan 2021
3D-ANAS: 3D Asymmetric Neural Architecture Search for Fast Hyperspectral
  Image Classification
3D-ANAS: 3D Asymmetric Neural Architecture Search for Fast Hyperspectral Image Classification
Haokui Zhang
Chengrong Gong
Yunpeng Bai
Zongwen Bai
Ying Li
14
27
0
12 Jan 2021
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Kunpeng Li
Zizhao Zhang
Guanhang Wu
Xuehan Xiong
Chen-Yu Lee
Zhichao Lu
Y. Fu
Tomas Pfister
29
5
0
11 Jan 2021
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video
  Recognition
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition
Hengduo Li
Zuxuan Wu
Abhinav Shrivastava
L. Davis
27
35
0
29 Dec 2020
Global Context Networks
Global Context Networks
Yue Cao
Jiarui Xu
Stephen Lin
Fangyun Wei
Han Hu
ISeg
36
96
0
24 Dec 2020
Human Action Recognition from Various Data Modalities: A Review
Human Action Recognition from Various Data Modalities: A Review
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
53
504
0
22 Dec 2020
TDN: Temporal Difference Networks for Efficient Action Recognition
TDN: Temporal Difference Networks for Efficient Action Recognition
Limin Wang
Zhan Tong
Bin Ji
Gangshan Wu
23
391
0
18 Dec 2020
Multi-shot Temporal Event Localization: a Benchmark
Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu
Yao Hu
S. Bai
Fei Ding
X. Bai
Philip Torr
46
81
0
17 Dec 2020
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
Tarun Kalluri
Deepak Pathak
Manmohan Chandraker
Du Tran
VGen
25
142
0
15 Dec 2020
GTA: Global Temporal Attention for Video Action Understanding
GTA: Global Temporal Attention for Video Action Understanding
Bo He
Xitong Yang
Zuxuan Wu
Hao Chen
Ser-Nam Lim
Abhinav Shrivastava
ViT
33
27
0
15 Dec 2020
NUTA: Non-uniform Temporal Aggregation for Action Recognition
NUTA: Non-uniform Temporal Aggregation for Action Recognition
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Hao Chen
Joseph Tighe
ViT
16
16
0
15 Dec 2020
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLM
AI4TS
38
185
0
11 Dec 2020
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency
  Prediction
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
Samyak Jain
P. Yarlagadda
Shreyank Jyoti
Shyamgopal Karthik
Subramanian Ramanathan
Vineet Gandhi
ViT
29
66
0
11 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
21
66
0
10 Dec 2020
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization
  for Efficient Video Classification
Diverse Temporal Aggregation and Depthwise Spatiotemporal Factorization for Efficient Video Classification
Youngwan Lee
Hyungil Kim
Kimin Yun
Jinyoung Moon
26
12
0
01 Dec 2020
Recent Progress in Appearance-based Action Recognition
Recent Progress in Appearance-based Action Recognition
J. Humphreys
Zhe Chen
Dacheng Tao
24
0
0
25 Nov 2020
A3D: Adaptive 3D Networks for Video Action Recognition
A3D: Adaptive 3D Networks for Video Action Recognition
Sijie Zhu
Taojiannan Yang
Matías Mendieta
Chong Chen
3DH
29
12
0
24 Nov 2020
Play Fair: Frame Attributions in Video Models
Play Fair: Frame Attributions in Video Models
Will Price
Dima Damen
FAtt
23
5
0
24 Nov 2020
QuerYD: A video dataset with high-quality text and audio narrations
QuerYD: A video dataset with high-quality text and audio narrations
Andreea-Maria Oncescu
João F. Henriques
Yang Liu
Andrew Zisserman
Samuel Albanie
VGen
16
11
0
22 Nov 2020
We don't Need Thousand Proposals$\colon$ Single Shot Actor-Action
  Detection in Videos
We don't Need Thousand Proposals ⁣:\colon: Single Shot Actor-Action Detection in Videos
A. J. Rana
Yogesh S Rawat
ViT
13
11
0
22 Nov 2020
3D CNNs with Adaptive Temporal Feature Resolutions
3D CNNs with Adaptive Temporal Feature Resolutions
Mohsen Fayyaz
Emad Bahrami Rad
Ali Diba
M. Noroozi
Ehsan Adeli
Luc Van Gool
Juergen Gall
3DPC
21
30
0
17 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
46
417
0
14 Nov 2020
Multimodal Pretraining for Dense Video Captioning
Multimodal Pretraining for Dense Video Captioning
Gabriel Huang
Bo Pang
Zhenhai Zhu
Clara E. Rivera
Radu Soricut
21
81
0
10 Nov 2020
Temporal Stochastic Softmax for 3D CNNs: An Application in Facial
  Expression Recognition
Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition
T. Ayral
M. Pedersoli
Simon L Bacon
Eric Granger
CVBM
3DH
13
11
0
10 Nov 2020
Mutual Modality Learning for Video Action Classification
Mutual Modality Learning for Video Action Classification
Stepan Alekseevich Komkov
Maksim Dzabraev
Aleksandr Petiushko
27
9
0
04 Nov 2020
PV-NAS: Practical Neural Architecture Search for Video Recognition
PV-NAS: Practical Neural Architecture Search for Video Recognition
Zihao Wang
Chen Lin
Lu Sheng
Junjie Yan
Jing Shao
ViT
14
7
0
02 Nov 2020
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised
  Video Representation Leaning
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning
L. Tao
Xueting Wang
T. Yamasaki
VLM
SSL
23
14
0
29 Oct 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action
  Recognition
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Chun-Fu Chen
Yikang Shen
K. Ramakrishnan
Rogerio Feris
J. M. Cohn
A. Oliva
Quanfu Fan
23
95
0
22 Oct 2020
Pose And Joint-Aware Action Recognition
Pose And Joint-Aware Action Recognition
Anshul B. Shah
Shlok Kumar Mishra
Ankan Bansal
Jun-Cheng Chen
Ramalingam Chellappa
Abhinav Shrivastava
39
33
0
16 Oct 2020
Back to the Future: Cycle Encoding Prediction for Self-supervised
  Contrastive Video Representation Learning
Back to the Future: Cycle Encoding Prediction for Self-supervised Contrastive Video Representation Learning
Xinyu Yang
Majid Mirmehdi
T. Burghardt
27
4
0
14 Oct 2020
Boosting Continuous Sign Language Recognition via Cross Modality
  Augmentation
Boosting Continuous Sign Language Recognition via Cross Modality Augmentation
Junfu Pu
Wen-gang Zhou
Hezhen Hu
Houqiang Li
43
108
0
11 Oct 2020
Contrastive Representation Learning: A Framework and Review
Contrastive Representation Learning: A Framework and Review
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSL
AI4TS
178
686
0
10 Oct 2020
Support-set bottlenecks for video-text representation learning
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
22
244
0
06 Oct 2020
Hierarchical Domain-Adapted Feature Learning for Video Saliency
  Prediction
Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction
Giovanni Bellitto
Federica Proietto Salanitri
S. Palazzo
Francesco Rundo
Daniela Giordano
C. Spampinato
MDE
15
52
0
02 Oct 2020
PERF-Net: Pose Empowered RGB-Flow Net
PERF-Net: Pose Empowered RGB-Flow Net
Yinxiao Li
Zhichao Lu
Xuehan Xiong
Jonathan Huang
3DH
37
17
0
28 Sep 2020
On the spatiotemporal behavior in biology-mimicking computing systems
On the spatiotemporal behavior in biology-mimicking computing systems
J. Végh
Ádám-József Berki
14
6
0
18 Sep 2020
Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural
  Networks
Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural Networks
Iulia Duta
Andrei Liviu Nicolicioiu
Marius Leordeanu
26
6
0
17 Sep 2020
Multi-Label Activity Recognition using Activity-specific Features and
  Activity Correlations
Multi-Label Activity Recognition using Activity-specific Features and Activity Correlations
Yanyi Zhang
Xinyu Li
I. Marsic
HAI
28
23
0
16 Sep 2020
Online Spatiotemporal Action Detection and Prediction via Causal
  Representations
Online Spatiotemporal Action Detection and Prediction via Causal Representations
Gurkirt Singh
3DPC
CML
21
0
0
31 Aug 2020
Previous
123...101112139
Next