Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 657 papers shown
Title
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
94
1
0
03 Dec 2022
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu
Biaolong Chen
Yue Liao
Shuwen Xiao
Wenyu Sun
Xiaobo Li
Yousong Zhu
Jinqiao Wang
Si Liu
CLIP
70
12
0
02 Dec 2022
Query Efficient Cross-Dataset Transferable Black-Box Attack on Action Recognition
Rohit Gupta
Naveed Akhtar
Gaurav Kumar Nayak
Ajmal Mian
M. Shah
AAML
69
1
0
23 Nov 2022
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
47
1
0
23 Nov 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
77
9
0
21 Nov 2022
Generalizable Deepfake Detection with Phase-Based Motion Analysis
Ekta Prashnani
Michael Goebel
B. S. Manjunath
79
6
0
17 Nov 2022
Dynamic Temporal Filtering in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Chong-Wah Ngo
Tao Mei
AI4TS
103
18
0
15 Nov 2022
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
H. Haresamudram
Irfan Essa
55
5
0
08 Nov 2022
Two-Stream Network for Sign Language Recognition and Translation
Yutong Chen
Ronglai Zuo
Fangyun Wei
Yu-Huan Wu
Shujie Liu
Brian Mak
SLR
81
132
0
02 Nov 2022
Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
86
22
0
13 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
89
72
0
12 Oct 2022
Contrastive Video-Language Learning with Fine-grained Frame Sampling
Zixu Wang
Yujie Zhong
Yishu Miao
Lin Ma
Lucia Specia
97
12
0
10 Oct 2022
Quantitative Metrics for Evaluating Explanations of Video DeepFake Detectors
Federico Baldassarre
Quentin Debard
Gonzalo Fiz Pontiveros
Tri Kurniawan Wijaya
82
4
0
07 Oct 2022
Locate before Answering: Answer Guided Question Localization for Video Question Answering
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiao-Wei Guo
Yu-Gang Jiang
105
18
0
05 Oct 2022
Alignment-guided Temporal Attention for Video Action Recognition
Yizhou Zhao
Zhenyang Li
Xun Guo
Yan Lu
65
14
0
30 Sep 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffM
VGen
97
1,439
0
29 Sep 2022
Rethinking Resolution in the Context of Efficient Video Recognition
Chuofan Ma
Qiushan Guo
Yi Jiang
Zehuan Yuan
Ping Luo
Xiaojuan Qi
119
12
0
26 Sep 2022
LGDN: Language-Guided Denoising Network for Video-Language Modeling
Haoyu Lu
Mingyu Ding
Nanyi Fei
Yuqi Huo
Zhiwu Lu
VLM
148
16
0
23 Sep 2022
Multi-level Adversarial Spatio-temporal Learning for Footstep Pressure based FoG Detection
Kun Hu
Shaohui Mei
Wei Wang
K. E. Martens
Liang Wang
S. Lewis
Dagan Feng
Zhiyong Wang
83
6
0
22 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
136
153
0
15 Sep 2022
Multiple View Performers for Shape Completion
David Watkins-Valls
Peter K. Allen
K. Choromanski
Jacob Varley
Nicholas R. Waytowich
46
1
0
13 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
130
65
0
04 Sep 2022
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
Tianjiao Li
Lin Geng Foo
Qiuhong Ke
Hossein Rahmani
Anran Wang
Jinghua Wang
Jing Liu
81
23
0
03 Sep 2022
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
Zdravko Marinov
Alina Roitberg
David Schneider
Rainer Stiefelhagen
63
5
0
19 Aug 2022
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
Shuo Liu
Weize Quan
Mingyuan Zhou
Sihong Chen
Jian Kang
Zhenlan Zhao
Chen Chen
Dong-Ming Yan
47
0
0
16 Aug 2022
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Medhini Narasimhan
Arsha Nagrani
Chen Sun
Michael Rubinstein
Trevor Darrell
Anna Rohrbach
Cordelia Schmid
83
35
0
14 Aug 2022
Motion Sensitive Contrastive Learning for Self-supervised Video Representation
Jingcheng Ni
Nana Zhou
Jie Qin
Qianrun Wu
Junqi Liu
Boxun Li
Di Huang
SSL
108
17
0
12 Aug 2022
Class-attention Video Transformer for Engagement Intensity Prediction
Xusheng Ai
Victor S. Sheng
Chunhua Li
Zhiming Cui
ViT
28
7
0
12 Aug 2022
Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction
Ying Fan
Longfei Han
Yue Zhang
Lechao Cheng
Chenzhen Xia
Di Hu
SSL
46
1
0
10 Aug 2022
Sports Video Analysis on Large-Scale Data
Dekun Wu
Henghui Zhao
Xingce Bao
Richard P. Wildes
65
14
0
09 Aug 2022
Frozen CLIP Models are Efficient Video Learners
Ziyi Lin
Shijie Geng
Renrui Zhang
Peng Gao
Gerard de Melo
Xiaogang Wang
Jifeng Dai
Yu Qiao
Hongsheng Li
CLIP
VLM
98
209
0
06 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
125
328
0
04 Aug 2022
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living
Zdravko Marinov
David Schneider
Alina Roitberg
Rainer Stiefelhagen
VGen
69
3
0
03 Aug 2022
Two-Stream Transformer Architecture for Long Video Understanding
Edward Fish
Jon Weinbren
Andrew Gilbert
ViT
52
6
0
02 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
104
18
0
01 Aug 2022
Static and Dynamic Concepts for Self-supervised Video Representation Learning
Rui Qian
Shuangrui Ding
Xian Liu
Dahua Lin
SSL
81
25
0
26 Jul 2022
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos
Jiang Bian
Xuhong Li
Tao Wang
Qingzhong Wang
Jun Huang
Chen Liu
Jun Zhao
Feixiang Lu
Dejing Dou
Haoyi Xiong
63
11
0
26 Jul 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
103
10
0
21 Jul 2022
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
Boyang Xia
Wenhao Wu
Haoran Wang
Rui Su
Dongliang He
Haosen Yang
Xiaoran Fan
Wanli Ouyang
119
22
0
21 Jul 2022
Temporal Saliency Query Network for Efficient Video Recognition
Boyang Xia
Zhihao Wang
Wenhao Wu
Haoran Wang
Jungong Han
100
16
0
21 Jul 2022
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
Huseyin Coskun
Alireza Zareian
Joshua L. Moore
F. Tombari
Chen Wang
SSL
98
3
0
20 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
142
27
0
20 Jul 2022
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
53
10
0
20 Jul 2022
Learning Sequence Representations by Non-local Recurrent Neural Memory
Wenjie Pei
Xin Feng
Canmiao Fu
Qi Cao
Guangming Lu
Yu-Wing Tai
AI4TS
72
1
0
20 Jul 2022
ERA: Expert Retrieval and Assembly for Early Action Prediction
Lin Geng Foo
Tianjiao Li
Hossein Rahmani
Qiuhong Ke
Jing Liu
75
15
0
20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos
Madeline Chantry Schiappa
Yogesh S Rawat
61
4
0
16 Jul 2022
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
95
122
0
16 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming-Hsuan Yang
Serge Belongie
Huayu Chen
VLM
74
22
0
15 Jul 2022
Long-term Leap Attention, Short-term Periodic Shift for Video Classification
Huatian Zhang
Lechao Cheng
Y. Hao
Chong-Wah Ngo
ViT
77
10
0
12 Jul 2022
Video Graph Transformer for Video Question Answering
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
229
78
0
12 Jul 2022
Previous
1
2
3
4
5
6
...
12
13
14
Next