ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXiv (abs)PDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 657 papers shown
Title
A Simple Multi-Modality Transfer Learning Baseline for Sign Language
  Translation
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Yutong Chen
Fangyun Wei
Xiao Sun
Zhirong Wu
Stephen Lin
SLR
88
104
0
08 Mar 2022
End-to-End Semi-Supervised Learning for Video Action Detection
End-to-End Semi-Supervised Learning for Video Action Detection
Akash Kumar
Yogesh S Rawat
77
32
0
08 Mar 2022
Behavior Recognition Based on the Integration of Multigranular Motion
  Features
Behavior Recognition Based on the Integration of Multigranular Motion Features
Lizong Zhang
Yiming Wang
Bei Hui
Xiu Zhang
Sijuan Liu
Shuxin Feng
32
0
0
07 Mar 2022
Motion-driven Visual Tempo Learning for Video-based Action Recognition
Motion-driven Visual Tempo Learning for Video-based Action Recognition
Yuanzhong Liu
Junsong Yuan
Zhigang Tu
76
61
0
24 Feb 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
183
227
0
18 Feb 2022
Shift-Memory Network for Temporal Scene Segmentation
Shift-Memory Network for Temporal Scene Segmentation
Guo Cheng
J. Zheng
136
0
0
17 Feb 2022
Should I take a walk? Estimating Energy Expenditure from Video Data
Should I take a walk? Estimating Energy Expenditure from Video Data
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
65
4
0
01 Feb 2022
vCLIMB: A Novel Video Class Incremental Learning Benchmark
vCLIMB: A Novel Video Class Incremental Learning Benchmark
Andrés Villa
Kumail Alhamoud
Juan Carlos León Alcázar
Fabian Caba Heilbron
Victor Escorcia
Guohao Li
CLL
149
33
0
23 Jan 2022
Self-supervised Video Representation Learning with Cascade Positive
  Retrieval
Self-supervised Video Representation Learning with Cascade Positive Retrieval
Cheng-En Wu
Farley Lai
Yujie Hu
Asim Kadav
SSLAI4TS
78
3
0
20 Jan 2022
Action Keypoint Network for Efficient Video Recognition
Action Keypoint Network for Efficient Video Recognition
Xu Chen
Yahong Han
Xiaohan Wang
Yifang Sun
Yi Yang
3DPC
79
6
0
17 Jan 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
97
221
0
12 Jan 2022
Motion-Focused Contrastive Learning of Video Representations
Motion-Focused Contrastive Learning of Video Representations
Rui Li
Yiheng Zhang
Zhaofan Qiu
Ting Yao
Dong Liu
Tao Mei
SSL
90
35
0
11 Jan 2022
Representing Videos as Discriminative Sub-graphs for Action Recognition
Representing Videos as Discriminative Sub-graphs for Action Recognition
Dong Li
Zhaofan Qiu
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
99
26
0
11 Jan 2022
Boosting Video Representation Learning with Multi-Faceted Integration
Boosting Video Representation Learning with Multi-Faceted Integration
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Xiaoping Zhang
Dong Wu
Tao Mei
63
9
0
11 Jan 2022
Condensing a Sequence to One Informative Frame for Video Recognition
Condensing a Sequence to One Informative Frame for Video Recognition
Zhaofan Qiu
Ting Yao
Y. Shu
Chong-Wah Ngo
Tao Mei
147
9
0
11 Jan 2022
Optimization Planning for 3D ConvNets
Optimization Planning for 3D ConvNets
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
3DPC3DH
86
9
0
11 Jan 2022
Discrete and continuous representations and processing in deep learning:
  Looking forward
Discrete and continuous representations and processing in deep learning: Looking forward
Ruben Cartuyvels
Graham Spinks
Marie-Francine Moens
OCL
91
20
0
04 Jan 2022
Fine-grained Multi-Modal Self-Supervised Learning
Fine-grained Multi-Modal Self-Supervised Learning
Duo Wang
S. Karout
SSL
65
7
0
22 Dec 2021
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for
  Cross-Domain Robustness in Action Recognition
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
Sofia Broomé
Ernest Pokropek
Boyu Li
Hedvig Kjellström
84
7
0
22 Dec 2021
Max-Margin Contrastive Learning
Max-Margin Contrastive Learning
Anshul B. Shah
S. Sra
Ramalingam Chellappa
A. Cherian
SSL
84
46
0
21 Dec 2021
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition
Yinghao Xu
Fangyun Wei
Xiao Sun
Ceyuan Yang
Yujun Shen
Bo Dai
Bolei Zhou
Stephen Lin
VLM
60
56
0
17 Dec 2021
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video
  Representation
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation
Yujia Zhang
L. Po
Xuyuan Xu
Mengyang Liu
Yexin Wang
Weifeng Ou
Yuzhi Zhao
Weikang Yu
SSLAI4TS
74
17
0
16 Dec 2021
Temporal Transformer Networks with Self-Supervision for Action
  Recognition
Temporal Transformer Networks with Self-Supervision for Action Recognition
Yongkang Zhang
Jun Li
Guoming Wu
Hanjie Zhang
Zhiping Shi
Zhaoxun Liu
Zizhang Wu
ViT
66
6
0
14 Dec 2021
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural
  Architecture Search
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search
Yi Ding
Xinyu Gong
Junru Wu
Humphrey Shi
Zhicheng Yan
Zhangyang Wang
VGen
86
1
0
09 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video
  Recognition
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Yuxuan Liang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
84
21
0
09 Dec 2021
Constrained Mean Shift Using Distant Yet Related Neighbors for
  Representation Learning
Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning
K. Navaneet
Soroush Abbasi Koohpayegani
Ajinkya Tejankar
Kossar Pourahmadi
Akshayvarun Subramanya
Hamed Pirsiavash
SSL
77
8
0
08 Dec 2021
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for
  Few-shot Video Classification
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
Rex Liu
Huan Zhang
Hamed Pirsiavash
Xin Liu
ViT
92
13
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
132
134
0
08 Dec 2021
Cross-modal Manifold Cutmix for Self-supervised Video Representation
  Learning
Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
Srijan Das
Michael S. Ryoo
SSL
39
0
0
07 Dec 2021
ViewCLR: Learning Self-supervised Video Representation for Unseen
  Viewpoints
ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints
Srijan Das
Michael S. Ryoo
SSL
90
20
0
07 Dec 2021
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised
  Video Representation Learning
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Manlin Zhang
Jinpeng Wang
A. J. Ma
83
9
0
07 Dec 2021
Time-Equivariant Contrastive Video Representation Learning
Time-Equivariant Contrastive Video Representation Learning
Simon Jenni
Hailin Jin
SSLAI4TS
207
61
0
07 Dec 2021
DCAN: Improving Temporal Action Detection via Dual Context Aggregation
DCAN: Improving Temporal Action Detection via Dual Context Aggregation
Guo Chen
Yin-Dong Zheng
Limin Wang
Tong Lu
AI4TS
137
74
0
07 Dec 2021
E$^2$(GO)MOTION: Motion Augmented Event Stream for Egocentric Action
  Recognition
E2^22(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
Chiara Plizzari
M. Planamente
Gabriele Goletto
Marco Cannici
Emanuele Gusso
Matteo Matteucci
Barbara Caputo
EgoV
104
57
0
07 Dec 2021
STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
Zhaoqilin Yang
Gaoyun An
64
5
0
05 Dec 2021
PreViTS: Contrastive Pretraining with Video Tracking Supervision
PreViTS: Contrastive Pretraining with Video Tracking Supervision
Brian Chen
Ramprasaath R. Selvaraju
Shih-Fu Chang
Juan Carlos Niebles
Nikhil Naik
ViT
75
2
0
01 Dec 2021
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video
  Question Answering
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang
Zi-yi Liu
N. Zheng
80
14
0
29 Nov 2021
Video Frame Interpolation Transformer
Video Frame Interpolation Transformer
Zhihao Shi
Xiangyu Xu
Xiaohong Liu
Jun Chen
Ming-Hsuan Yang
ViT
69
166
0
27 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video
  Captioning
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
85
247
0
25 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
148
221
0
24 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale
  Video Transcriptions
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TSVLM
78
194
0
19 Nov 2021
Induce, Edit, Retrieve: Language Grounded Multimodal Schema for
  Instructional Video Retrieval
Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval
Yue Yang
Joongwon Kim
Artemis Panagopoulou
Mark Yatskar
Chris Callison-Burch
LM&Ro
56
14
0
17 Nov 2021
A Survey of Visual Transformers
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGSViT
189
356
0
11 Nov 2021
Joint Learning of Visual-Audio Saliency Prediction and Sound Source
  Localization on Multi-face Videos
Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
CVBM
65
5
0
05 Nov 2021
Revisiting spatio-temporal layouts for compositional action recognition
Revisiting spatio-temporal layouts for compositional action recognition
Gorjan Radevski
Marie-Francine Moens
Tinne Tuytelaars
104
26
0
02 Nov 2021
Masking Modalities for Cross-modal Video Retrieval
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
85
30
0
01 Nov 2021
ST-ABN: Visual Explanation Taking into Account Spatio-temporal
  Information for Video Recognition
ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video Recognition
Masahiro Mitsuhara
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
51
1
0
29 Oct 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
116
25
0
27 Oct 2021
Using Motion History Images with 3D Convolutional Networks in Isolated
  Sign Language Recognition
Using Motion History Images with 3D Convolutional Networks in Isolated Sign Language Recognition
Hamed Valizadegan
D. Caldwell
SLR
64
51
0
24 Oct 2021
Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations
  in Instructional Videos
Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
Reuben Tan
Bryan A. Plummer
Kate Saenko
Hailin Jin
Bryan C. Russell
SSL
94
27
0
20 Oct 2021
Previous
123...678...121314
Next