Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 657 papers shown
Title
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Yutong Chen
Fangyun Wei
Xiao Sun
Zhirong Wu
Stephen Lin
SLR
88
104
0
08 Mar 2022
End-to-End Semi-Supervised Learning for Video Action Detection
Akash Kumar
Yogesh S Rawat
77
32
0
08 Mar 2022
Behavior Recognition Based on the Integration of Multigranular Motion Features
Lizong Zhang
Yiming Wang
Bei Hui
Xiu Zhang
Sijuan Liu
Shuxin Feng
32
0
0
07 Mar 2022
Motion-driven Visual Tempo Learning for Video-based Action Recognition
Yuanzhong Liu
Junsong Yuan
Zhigang Tu
76
61
0
24 Feb 2022
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
183
227
0
18 Feb 2022
Shift-Memory Network for Temporal Scene Segmentation
Guo Cheng
J. Zheng
136
0
0
17 Feb 2022
Should I take a walk? Estimating Energy Expenditure from Video Data
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
65
4
0
01 Feb 2022
vCLIMB: A Novel Video Class Incremental Learning Benchmark
Andrés Villa
Kumail Alhamoud
Juan Carlos León Alcázar
Fabian Caba Heilbron
Victor Escorcia
Guohao Li
CLL
149
33
0
23 Jan 2022
Self-supervised Video Representation Learning with Cascade Positive Retrieval
Cheng-En Wu
Farley Lai
Yujie Hu
Asim Kadav
SSL
AI4TS
78
3
0
20 Jan 2022
Action Keypoint Network for Efficient Video Recognition
Xu Chen
Yahong Han
Xiaohan Wang
Yifang Sun
Yi Yang
3DPC
79
6
0
17 Jan 2022
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
97
221
0
12 Jan 2022
Motion-Focused Contrastive Learning of Video Representations
Rui Li
Yiheng Zhang
Zhaofan Qiu
Ting Yao
Dong Liu
Tao Mei
SSL
90
35
0
11 Jan 2022
Representing Videos as Discriminative Sub-graphs for Action Recognition
Dong Li
Zhaofan Qiu
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
99
26
0
11 Jan 2022
Boosting Video Representation Learning with Multi-Faceted Integration
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Xiaoping Zhang
Dong Wu
Tao Mei
63
9
0
11 Jan 2022
Condensing a Sequence to One Informative Frame for Video Recognition
Zhaofan Qiu
Ting Yao
Y. Shu
Chong-Wah Ngo
Tao Mei
147
9
0
11 Jan 2022
Optimization Planning for 3D ConvNets
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
3DPC
3DH
86
9
0
11 Jan 2022
Discrete and continuous representations and processing in deep learning: Looking forward
Ruben Cartuyvels
Graham Spinks
Marie-Francine Moens
OCL
91
20
0
04 Jan 2022
Fine-grained Multi-Modal Self-Supervised Learning
Duo Wang
S. Karout
SSL
65
7
0
22 Dec 2021
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
Sofia Broomé
Ernest Pokropek
Boyu Li
Hedvig Kjellström
84
7
0
22 Dec 2021
Max-Margin Contrastive Learning
Anshul B. Shah
S. Sra
Ramalingam Chellappa
A. Cherian
SSL
84
46
0
21 Dec 2021
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition
Yinghao Xu
Fangyun Wei
Xiao Sun
Ceyuan Yang
Yujun Shen
Bo Dai
Bolei Zhou
Stephen Lin
VLM
60
56
0
17 Dec 2021
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation
Yujia Zhang
L. Po
Xuyuan Xu
Mengyang Liu
Yexin Wang
Weifeng Ou
Yuzhi Zhao
Weikang Yu
SSL
AI4TS
74
17
0
16 Dec 2021
Temporal Transformer Networks with Self-Supervision for Action Recognition
Yongkang Zhang
Jun Li
Guoming Wu
Hanjie Zhang
Zhiping Shi
Zhaoxun Liu
Zizhang Wu
ViT
66
6
0
14 Dec 2021
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search
Yi Ding
Xinyu Gong
Junru Wu
Humphrey Shi
Zhicheng Yan
Zhangyang Wang
VGen
86
1
0
09 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Yuxuan Liang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
84
21
0
09 Dec 2021
Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning
K. Navaneet
Soroush Abbasi Koohpayegani
Ajinkya Tejankar
Kossar Pourahmadi
Akshayvarun Subramanya
Hamed Pirsiavash
SSL
77
8
0
08 Dec 2021
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
Rex Liu
Huan Zhang
Hamed Pirsiavash
Xin Liu
ViT
92
13
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
132
134
0
08 Dec 2021
Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
Srijan Das
Michael S. Ryoo
SSL
39
0
0
07 Dec 2021
ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints
Srijan Das
Michael S. Ryoo
SSL
90
20
0
07 Dec 2021
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Manlin Zhang
Jinpeng Wang
A. J. Ma
83
9
0
07 Dec 2021
Time-Equivariant Contrastive Video Representation Learning
Simon Jenni
Hailin Jin
SSL
AI4TS
207
61
0
07 Dec 2021
DCAN: Improving Temporal Action Detection via Dual Context Aggregation
Guo Chen
Yin-Dong Zheng
Limin Wang
Tong Lu
AI4TS
137
74
0
07 Dec 2021
E
2
^2
2
(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
Chiara Plizzari
M. Planamente
Gabriele Goletto
Marco Cannici
Emanuele Gusso
Matteo Matteucci
Barbara Caputo
EgoV
104
57
0
07 Dec 2021
STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
Zhaoqilin Yang
Gaoyun An
64
5
0
05 Dec 2021
PreViTS: Contrastive Pretraining with Video Tracking Supervision
Brian Chen
Ramprasaath R. Selvaraju
Shih-Fu Chang
Juan Carlos Niebles
Nikhil Naik
ViT
75
2
0
01 Dec 2021
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang
Zi-yi Liu
N. Zheng
80
14
0
29 Nov 2021
Video Frame Interpolation Transformer
Zhihao Shi
Xiangyu Xu
Xiaohong Liu
Jun Chen
Ming-Hsuan Yang
ViT
69
166
0
27 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
85
247
0
25 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
148
221
0
24 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
78
194
0
19 Nov 2021
Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval
Yue Yang
Joongwon Kim
Artemis Panagopoulou
Mark Yatskar
Chris Callison-Burch
LM&Ro
56
14
0
17 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
189
356
0
11 Nov 2021
Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
CVBM
65
5
0
05 Nov 2021
Revisiting spatio-temporal layouts for compositional action recognition
Gorjan Radevski
Marie-Francine Moens
Tinne Tuytelaars
104
26
0
02 Nov 2021
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
85
30
0
01 Nov 2021
ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video Recognition
Masahiro Mitsuhara
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
51
1
0
29 Oct 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
116
25
0
27 Oct 2021
Using Motion History Images with 3D Convolutional Networks in Isolated Sign Language Recognition
Hamed Valizadegan
D. Caldwell
SLR
64
51
0
24 Oct 2021
Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
Reuben Tan
Bryan A. Plummer
Kate Saenko
Hailin Jin
Bryan C. Russell
SSL
94
27
0
20 Oct 2021
Previous
1
2
3
...
6
7
8
...
12
13
14
Next