Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 657 papers shown
Title
Dual Contrastive Learning for Spatio-temporal Representation
Shuangrui Ding
Rui Qian
H. Xiong
AI4TS
SSL
63
23
0
12 Jul 2022
VidConv: A modernized 2D ConvNet for Efficient Video Recognition
Chuong H. Nguyen
Su Huynh
Vinh Nguyen
Ngoc-Khanh Nguyen
ViT
62
3
0
08 Jul 2022
Video Dialog as Conversation about Objects Living in Space-Time
H. Pham
T. Le
Vuong Le
Tu Minh Phuong
T. Tran
80
11
0
08 Jul 2022
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
Madeline Chantry Schiappa
Shruti Vyas
Hamid Palangi
Yogesh S Rawat
Vibhav Vineet
VLM
164
20
0
05 Jul 2022
Large-scale Robustness Analysis of Video Action Recognition Models
Madeline Chantry Schiappa
Naman Biyani
Prudvi Kamtam
Shruti Vyas
Hamid Palangi
Vibhav Vineet
Yogesh S Rawat
AAML
87
28
0
04 Jul 2022
GraphVid: It Only Takes a Few Nodes to Understand a Video
Eitan Kosman
Dotan Di Castro
GNN
85
5
0
04 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
188
99
0
04 Jul 2022
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Taeoh Kim
Jinhyung Kim
Minho Shim
Sangdoo Yun
Myunggu Kang
Dongyoon Wee
Sangyoun Lee
AI4TS
116
10
0
30 Jun 2022
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
Junting Pan
Ziyi Lin
Xiatian Zhu
Jing Shao
Hongsheng Li
96
206
0
27 Jun 2022
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos
S. H. Khorasgani
Yuxuan Chen
Florian Shkurti
SSL
114
24
0
25 Jun 2022
Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline
Kailai Zhou
Yibo Wang
Tao Lv
Yunqian Li
Linsen Chen
Qiu Shen
Xun Cao
63
11
0
23 Jun 2022
Symmetric Network with Spatial Relationship Modeling for Natural Language-based Vehicle Retrieval
Chuyang Zhao
Haobo Chen
Wenyuan Zhang
Junru Chen
Sipeng Zhang
Yadong Li
Boxun Li
66
10
0
22 Jun 2022
Bi-Calibration Networks for Weakly-Supervised Video Representation Learning
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
75
6
0
21 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
128
136
0
18 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
150
388
0
17 Jun 2022
Stand-Alone Inter-Frame Attention in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Jiebo Luo
Tao Mei
ViT
65
47
0
14 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
73
15
0
13 Jun 2022
Words are all you need? Language as an approximation for human similarity judgments
Raja Marjieh
Pol van Rijn
Ilia Sucholutsky
T. Sumers
Harin Lee
Thomas Griffiths
Nori Jacoby
90
19
0
08 Jun 2022
Egocentric Video-Language Pretraining
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Guohao Li
Wei Liu
Mike Zheng Shou
VLM
EgoV
104
207
0
03 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
84
57
0
02 Jun 2022
Deep Posterior Distribution-based Embedding for Hyperspectral Image Super-resolution
Jinhui Hou
Zhiyu Zhu
Junhui Hou
Huanqiang Zeng
Jinjian Wu
Jiantao Zhou
SupR
82
17
0
30 May 2022
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
89
35
0
10 May 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Mingdong Yang
Guo Chen
Yin-Dong Zheng
Tong Lu
Limin Wang
85
48
0
05 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
61
0
0
03 May 2022
Preserve Pre-trained Knowledge: Transfer Learning With Self-Distillation For Action Recognition
Yang Zhou
Zhanhao He
Ke Lu
Guanhong Wang
Gaoang Wang
CLL
SLR
88
2
0
01 May 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
81
44
0
26 Apr 2022
Contrastive Language-Action Pre-training for Temporal Localization
Mengmeng Xu
Erhan Gundogdu
⋆⋆ Maksim
Guohao Li
M. Donoser
Loris Bazzani
100
27
0
26 Apr 2022
Temporal Relevance Analysis for Video Action Models
Quanfu Fan
Donghyun Kim
Chun-Fu Chen
Chen
Stan Sclaroff
Kate Saenko
Sarah Adel Bargal
FAtt
66
0
0
25 Apr 2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Chengyue Wu
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
116
115
0
25 Apr 2022
Attention in Attention: Modeling Context Correlation for Efficient Video Classification
Y. Hao
Shuo Wang
P. Cao
Xinjian Gao
Tong Xu
Jinmeng Wu
Xiangnan He
93
41
0
20 Apr 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation
Shusheng Yang
Xinggang Wang
Yu Li
Yuxin Fang
Jiemin Fang
Wenyu Liu
Xun Zhao
Ying Shan
ViT
72
67
0
18 Apr 2022
Video Action Detection: Analysing Limitations and Challenges
Rajat Modi
A. J. Rana
Akash Kumar
Praveen Tirupattur
Shruti Vyas
Yogesh S Rawat
M. Shah
98
12
0
17 Apr 2022
Model-agnostic Multi-Domain Learning with Domain-Specific Adapters for Action Recognition
Kazuki Omi
Jun Kimata
Toru Tamaki
76
8
0
15 Apr 2022
Learning Pixel-Level Distinctions for Video Highlight Detection
Fanyue Wei
Biao Wang
T. Ge
Yuning Jiang
Wen Li
Lixin Duan
44
20
0
10 Apr 2022
Self-Supervised Video Representation Learning with Motion-Contrastive Perception
Jin-Yuan Liu
Ying Cheng
Yuejie Zhang
Ruiwei Zhao
Rui Feng
SSL
70
1
0
10 Apr 2022
Probabilistic Representations for Video Contrastive Learning
Jungin Park
Jiyoung Lee
Ig-Jae Kim
Kwanghoon Sohn
SSL
106
47
0
08 Apr 2022
Frequency Selective Augmentation for Video Representation Learning
Jinhyung Kim
Taeoh Kim
Minho Shim
Dongyoon Han
Dongyoon Wee
Junmo Kim
AI4TS
101
4
0
08 Apr 2022
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations
Jie Jiang
Shaobo Min
Weijie Kong
Dihong Gong
Hongfa Wang
Zhifeng Li
Wei Liu
VLM
122
20
0
07 Apr 2022
Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yi Tian Xu
Xiang Wang
Mingqian Tang
Changxin Gao
Rong Jin
Nong Sang
SSL
AI4TS
76
17
0
06 Apr 2022
An Empirical Study of End-to-End Temporal Action Detection
Xiaolong Liu
S. Bai
Xiang Bai
92
59
0
06 Apr 2022
Exploiting Temporal Relations on Radar Perception for Autonomous Driving
Peizhao Li
Puzuo Wang
K. Berntorp
Hongfu Liu
110
43
0
03 Apr 2022
Deformable Video Transformer
Jue Wang
Lorenzo Torresani
ViT
98
28
0
31 Mar 2022
Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Dohwan Ko
Joonmyung Choi
Juyeon Ko
Shinyeong Noh
Kyoung-Woon On
Eun-Sol Kim
Hyunwoo J. Kim
VGen
AI4TS
84
22
0
31 Mar 2022
Controllable Augmentations for Video Representation Learning
Rui Qian
Weiyao Lin
John See
Dian Li
SSL
AI4TS
54
10
0
30 Mar 2022
Interpretable Prediction of Pulmonary Hypertension in Newborns using Echocardiograms
H. Ragnarsdóttir
Laura Manduchi
H. Michel
F. Laumer
S. Wellmann
Ece Ozkan
Julia-Franziska Vogt
51
3
0
24 Mar 2022
Facial Expression Analysis Using Decomposed Multiscale Spatiotemporal Networks
W. Melo
Eric Granger
Miguel Bordallo López
CVBM
78
22
0
21 Mar 2022
DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
Thanh-Dat Truong
Quoc-Huy Bui
C. Duong
Han-Seok Seo
Son Lam Phung
Xin Li
Khoa Luu
ViT
121
51
0
19 Mar 2022
Group Contextualization for Video Recognition
Y. Hao
Haotong Zhang
Chong-Wah Ngo
Xiangnan He
59
27
0
18 Mar 2022
Gate-Shift-Fuse for Video Action Recognition
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
85
24
0
16 Mar 2022
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
Yidan Sun
Qin Chao
Yangfeng Ji
Boyang Albert Li
VGen
79
11
0
11 Mar 2022
Previous
1
2
3
...
5
6
7
...
12
13
14
Next