ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Z. Tu
Kevin Patrick Murphy
    3DH
ArXivPDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 650 papers shown
Title
Two-Stream Network for Sign Language Recognition and Translation
Two-Stream Network for Sign Language Recognition and Translation
Yutong Chen
Ronglai Zuo
Fangyun Wei
Yu-Huan Wu
Shujie Liu
Brian Mak
SLR
42
118
0
02 Nov 2022
Sparse in Space and Time: Audio-visual Synchronisation with Trainable
  Selectors
Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
39
20
0
13 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal
  Contrastive Learning
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
20
68
0
12 Oct 2022
Contrastive Video-Language Learning with Fine-grained Frame Sampling
Contrastive Video-Language Learning with Fine-grained Frame Sampling
Zixu Wang
Yujie Zhong
Yishu Miao
Lin Ma
Lucia Specia
52
11
0
10 Oct 2022
Quantitative Metrics for Evaluating Explanations of Video DeepFake
  Detectors
Quantitative Metrics for Evaluating Explanations of Video DeepFake Detectors
Federico Baldassarre
Quentin Debard
Gonzalo Fiz Pontiveros
Tri Kurniawan Wijaya
44
4
0
07 Oct 2022
Locate before Answering: Answer Guided Question Localization for Video
  Question Answering
Locate before Answering: Answer Guided Question Localization for Video Question Answering
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiao-Wei Guo
Yu-Gang Jiang
34
17
0
05 Oct 2022
Alignment-guided Temporal Attention for Video Action Recognition
Alignment-guided Temporal Attention for Video Action Recognition
Yizhou Zhao
Zhenyang Li
Xun Guo
Yan Lu
20
14
0
30 Sep 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data
Make-A-Video: Text-to-Video Generation without Text-Video Data
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffM
VGen
45
1,352
0
29 Sep 2022
Rethinking Resolution in the Context of Efficient Video Recognition
Rethinking Resolution in the Context of Efficient Video Recognition
Chuofan Ma
Qiushan Guo
Yi-Xin Jiang
Zehuan Yuan
Ping Luo
Xiaojuan Qi
68
12
0
26 Sep 2022
LGDN: Language-Guided Denoising Network for Video-Language Modeling
LGDN: Language-Guided Denoising Network for Video-Language Modeling
Haoyu Lu
Mingyu Ding
Nanyi Fei
Yuqi Huo
Zhiwu Lu
VLM
85
16
0
23 Sep 2022
Multi-level Adversarial Spatio-temporal Learning for Footstep Pressure
  based FoG Detection
Multi-level Adversarial Spatio-temporal Learning for Footstep Pressure based FoG Detection
Kun Hu
Shaohui Mei
Wei Wang
K. E. Martens
Liang Wang
S. Lewis
D. Feng
Zhiyong Wang
35
5
0
22 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
38
149
0
15 Sep 2022
Multiple View Performers for Shape Completion
Multiple View Performers for Shape Completion
David Watkins-Valls
Peter K. Allen
K. Choromanski
Jacob Varley
Nicholas R. Waytowich
22
1
0
13 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked
  Visual Modeling
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
32
64
0
04 Sep 2022
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action
  Recognition
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
Tianjiao Li
Lin Geng Foo
Qiuhong Ke
Hossein Rahmani
Anran Wang
Jinghua Wang
Xiaozhong Liu
19
21
0
03 Sep 2022
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain
  Generalization
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
Zdravko Marinov
Alina Roitberg
David Schneider
Rainer Stiefelhagen
24
4
0
19 Aug 2022
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
Shuo Liu
Weize Quan
Mingyuan Zhou
Sihong Chen
Jian Kang
Zhenlan Zhao
Chen Chen
Dong-Ming Yan
28
0
0
16 Aug 2022
TL;DW? Summarizing Instructional Videos with Task Relevance &
  Cross-Modal Saliency
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Medhini Narasimhan
Arsha Nagrani
Chen Sun
Michael Rubinstein
Trevor Darrell
Anna Rohrbach
Cordelia Schmid
20
34
0
14 Aug 2022
Motion Sensitive Contrastive Learning for Self-supervised Video
  Representation
Motion Sensitive Contrastive Learning for Self-supervised Video Representation
Jingcheng Ni
Nana Zhou
Jie Qin
Qianrun Wu
Junqi Liu
Boxun Li
Di Huang
SSL
42
16
0
12 Aug 2022
Class-attention Video Transformer for Engagement Intensity Prediction
Class-attention Video Transformer for Engagement Intensity Prediction
Xusheng Ai
Victor S. Sheng
Chunhua Li
Zhiming Cui
ViT
17
5
0
12 Aug 2022
Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction
Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction
Ying Fan
Longfei Han
Yue Zhang
Lechao Cheng
Chenzhen Xia
Di Hu
SSL
21
1
0
10 Aug 2022
Sports Video Analysis on Large-Scale Data
Sports Video Analysis on Large-Scale Data
Dekun Wu
Henghui Zhao
Xingce Bao
Richard P. Wildes
29
13
0
09 Aug 2022
Frozen CLIP Models are Efficient Video Learners
Frozen CLIP Models are Efficient Video Learners
Ziyi Lin
Shijie Geng
Renrui Zhang
Peng Gao
Gerard de Melo
Xiaogang Wang
Jifeng Dai
Yu Qiao
Hongsheng Li
CLIP
VLM
16
200
0
06 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
40
313
0
04 Aug 2022
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real
  Recognition of Activities of Daily Living
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living
Zdravko Marinov
David Schneider
Alina Roitberg
Rainer Stiefelhagen
VGen
32
2
0
03 Aug 2022
Two-Stream Transformer Architecture for Long Video Understanding
Two-Stream Transformer Architecture for Long Video Understanding
Edward Fish
Jon Weinbren
Andrew Gilbert
ViT
33
6
0
02 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
34
18
0
01 Aug 2022
Static and Dynamic Concepts for Self-supervised Video Representation
  Learning
Static and Dynamic Concepts for Self-supervised Video Representation Learning
Rui Qian
Shuangrui Ding
Xian Liu
Dahua Lin
SSL
36
22
0
26 Jul 2022
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table
  Tennis Match Broadcasting Videos
P2ANet: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos
Jiang Bian
Xuhong Li
Tao Wang
Qingzhong Wang
Jun Huang
Chen Liu
Jun Zhao
Feixiang Lu
Dejing Dou
Haoyi Xiong
26
10
0
26 Jul 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
27
10
0
21 Jul 2022
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition
Boyang Xia
Wenhao Wu
Haoran Wang
Rui Su
Dongliang He
Haosen Yang
Xiaoran Fan
Wanli Ouyang
23
21
0
21 Jul 2022
Temporal Saliency Query Network for Efficient Video Recognition
Temporal Saliency Query Network for Efficient Video Recognition
Boyang Xia
Zhihao Wang
Wenhao Wu
Haoran Wang
Jungong Han
51
15
0
21 Jul 2022
GOCA: Guided Online Cluster Assignment for Self-Supervised Video
  Representation Learning
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
Huseyin Coskun
Alireza Zareian
Joshua L. Moore
F. Tombari
Chen Wang
SSL
53
3
0
20 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
37
27
0
20 Jul 2022
ViGAT: Bottom-up event recognition and explanation in video using
  factorized graph attention network
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
19
10
0
20 Jul 2022
Learning Sequence Representations by Non-local Recurrent Neural Memory
Learning Sequence Representations by Non-local Recurrent Neural Memory
Wenjie Pei
Xin Feng
Canmiao Fu
Qi Cao
Guangming Lu
Yu-Wing Tai
AI4TS
27
1
0
20 Jul 2022
ERA: Expert Retrieval and Assembly for Early Action Prediction
ERA: Expert Retrieval and Assembly for Early Action Prediction
Lin Geng Foo
Tianjiao Li
Hossein Rahmani
Qiuhong Ke
Xiaozhong Liu
19
15
0
20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos
SVGraph: Learning Semantic Graphs from Instructional Videos
Madeline Chantry Schiappa
Yogesh S Rawat
17
4
0
16 Jul 2022
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
Yuqi Liu
Pengfei Xiong
Luhui Xu
Shengming Cao
Qin Jin
39
114
0
16 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision
  and Language Models
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming Yang
Serge Belongie
Huayu Chen
VLM
41
22
0
15 Jul 2022
Long-term Leap Attention, Short-term Periodic Shift for Video
  Classification
Long-term Leap Attention, Short-term Periodic Shift for Video Classification
Huan Zhang
Lechao Cheng
Y. Hao
Chong-Wah Ngo
ViT
31
10
0
12 Jul 2022
Video Graph Transformer for Video Question Answering
Video Graph Transformer for Video Question Answering
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
159
75
0
12 Jul 2022
Dual Contrastive Learning for Spatio-temporal Representation
Dual Contrastive Learning for Spatio-temporal Representation
Shuangrui Ding
Rui Qian
H. Xiong
AI4TS
SSL
35
21
0
12 Jul 2022
VidConv: A modernized 2D ConvNet for Efficient Video Recognition
VidConv: A modernized 2D ConvNet for Efficient Video Recognition
Chuong H. Nguyen
Su Huynh
Vinh Nguyen
Ngoc-Khanh Nguyen
ViT
27
3
0
08 Jul 2022
Video Dialog as Conversation about Objects Living in Space-Time
Video Dialog as Conversation about Objects Living in Space-Time
H. Pham
T. Le
Vuong Le
Tu Minh Phuong
T. Tran
44
11
0
08 Jul 2022
Robustness Analysis of Video-Language Models Against Visual and Language
  Perturbations
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
Madeline Chantry Schiappa
Shruti Vyas
Hamid Palangi
Yogesh S Rawat
Vibhav Vineet
VLM
123
17
0
05 Jul 2022
Large-scale Robustness Analysis of Video Action Recognition Models
Large-scale Robustness Analysis of Video Action Recognition Models
Madeline Chantry Schiappa
Naman Biyani
Prudvi Kamtam
Shruti Vyas
Hamid Palangi
Vibhav Vineet
Yogesh S Rawat
AAML
37
24
0
04 Jul 2022
GraphVid: It Only Takes a Few Nodes to Understand a Video
GraphVid: It Only Takes a Few Nodes to Understand a Video
Eitan Kosman
Dotan Di Castro
GNN
38
5
0
04 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video
  Recognition
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
103
93
0
04 Jul 2022
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Taeoh Kim
Jinhyung Kim
Minho Shim
Sangdoo Yun
Myunggu Kang
Dongyoon Wee
Sangyoun Lee
AI4TS
20
10
0
30 Jun 2022
Previous
123456...111213
Next