ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
v1v2 (latest)

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXiv (abs)PDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 657 papers shown
Title
Attention Bottlenecks for Multimodal Fusion
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
114
574
0
30 Jun 2021
When Video Classification Meets Incremental Classes
When Video Classification Meets Incremental Classes
Hanbin Zhao
Xin Qin
Shihao Su
Yongjian Fu
Zibo Lin
Xi Li
CLL
73
28
0
30 Jun 2021
Long-Short Temporal Modeling for Efficient Action Recognition
Long-Short Temporal Modeling for Efficient Action Recognition
Liyu Wu
Yuexian Zou
Can Zhang
38
1
0
30 Jun 2021
Unsupervised Discovery of Actions in Instructional Videos
Unsupervised Discovery of Actions in Instructional Videos
A. Piergiovanni
A. Angelova
Michael S. Ryoo
Irfan Essa
36
3
0
28 Jun 2021
Hyperbolic Busemann Learning with Ideal Prototypes
Hyperbolic Busemann Learning with Ideal Prototypes
Mina Ghadimi Atigh
Martin Keller-Ressel
Pascal Mettes
115
40
0
28 Jun 2021
Can An Image Classifier Suffice For Action Recognition?
Can An Image Classifier Suffice For Action Recognition?
Quanfu Fan
Chun-Fu Chen
Chen
Yikang Shen
ViT
95
34
0
26 Jun 2021
Hierarchical Object-oriented Spatio-Temporal Reasoning for Video
  Question Answering
Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering
Long Hoang Dang
T. Le
Vuong Le
T. Tran
98
62
0
25 Jun 2021
Video Swin Transformer
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
125
1,498
0
24 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
149
129
0
21 Jun 2021
All You Can Embed: Natural Language based Vehicle Retrieval with
  Spatio-Temporal Transformers
All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers
Carmelo Scribano
D. Sapienza
Giorgia Franchini
M. Verucchi
Marko Bertogna
58
4
0
18 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream
  Prototypical Contrasting
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
108
11
0
18 Jun 2021
Multi-Granularity Network with Modal Attention for Dense Affective
  Understanding
Multi-Granularity Network with Modal Attention for Dense Affective Understanding
Baoming Yan
Lin Wang
Ke Gao
Bo Gao
Xiao-Chang Liu
Chao Ban
Jiang Yang
Xiaobo Li
VGen
39
2
0
18 Jun 2021
MaCLR: Motion-aware Contrastive Learning of Representations for Videos
MaCLR: Motion-aware Contrastive Learning of Representations for Videos
Fanyi Xiao
Joseph Tighe
Davide Modolo
SSL
68
14
0
17 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
154
11
0
12 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
114
282
0
09 Jun 2021
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding
  Evaluation
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Linjie Li
Jie Lei
Zhe Gan
Licheng Yu
Yen-Chun Chen
...
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
112
103
0
08 Jun 2021
Transformed ROIs for Capturing Visual Transformations in Videos
Transformed ROIs for Capturing Visual Transformations in Videos
Abhinav Rai
Fadime Sener
Angela Yao
ViT
69
3
0
06 Jun 2021
ASCNet: Self-supervised Video Representation Learning with
  Appearance-Speed Consistency
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency
Deng Huang
Wenhao Wu
Weiwen Hu
Xu Liu
Dongliang He
Zhihua Wu
Xiangmiao Wu
Ming Tan
Errui Ding
SSL
69
55
0
04 Jun 2021
CT-Net: Channel Tensorization Network for Video Classification
CT-Net: Channel Tensorization Network for Video Classification
Kunchang Li
Xianhang Li
Yali Wang
Jun Wang
Yu Qiao
ViT
72
55
0
03 Jun 2021
TSI: Temporal Saliency Integration for Video Action Recognition
TSI: Temporal Saliency Integration for Video Action Recognition
Haisheng Su
Kunchang Li
Jinyuan Feng
Dongliang Wang
Weihao Gan
Wei Wu
Yu Qiao
57
4
0
02 Jun 2021
Connecting Language and Vision for Natural Language-Based Vehicle
  Retrieval
Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
Shuai Bai
Zhedong Zheng
Xiaohan Wang
Junyang Lin
Zhu Zhang
Chang Zhou
Yi Yang
Hongxia Yang
103
27
0
31 May 2021
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
S. McCrae
Kehan Wang
A. Zakhor
60
15
0
26 May 2021
DSANet: Dynamic Segment Aggregation Network for Video-Level
  Representation Learning
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wenhao Wu
Yuxiang Zhao
Yanwu Xu
Xiao Tan
Dongliang He
...
Jinxing Ye
Yingying Li
Mingde Yao
Zichao Dong
Yifeng Shi
AI4TS
91
30
0
25 May 2021
Temporal Action Proposal Generation with Transformers
Temporal Action Proposal Generation with Transformers
Lining Wang
Haosen Yang
Wenhao Wu
Huanjin Yao
Hujie Huang
ViT
85
28
0
25 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video
  Understanding
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
82
133
0
20 May 2021
MutualNet: Adaptive ConvNet via Mutual Learning from Different Model
  Configurations
MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations
Taojiannan Yang
Sijie Zhu
Matías Mendieta
Pu Wang
Ravikumar Balakrishnan
Minwoo Lee
T. Han
M. Shah
Chong Chen
3DHOOD
102
24
0
14 May 2021
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor
  Segmentation
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
Tianrui Hui
Shaofei Huang
Si Liu
Zihan Ding
Guanbin Li
Wenguan Wang
Jizhong Han
Fei Wang
76
49
0
14 May 2021
REGINA - Reasoning Graph Convolutional Networks in Human Action
  Recognition
REGINA - Reasoning Graph Convolutional Networks in Human Action Recognition
Bruno Degardin
Vasco Lopes
Hugo Proencca
3DHGNN
62
10
0
14 May 2021
Designing Multimodal Datasets for NLP Challenges
Designing Multimodal Datasets for NLP Challenges
James Pustejovsky
E. Holderness
Jingxuan Tu
Parker Glenn
Kyeongmin Rim
Kelley Lynch
R. Brutti
46
5
0
12 May 2021
Temporal-Spatial Feature Pyramid for Video Saliency Detection
Temporal-Spatial Feature Pyramid for Video Saliency Detection
Qinyao Chang
Shiping Zhu
84
27
0
10 May 2021
Adaptive Focus for Efficient Video Recognition
Adaptive Focus for Efficient Video Recognition
Yulin Wang
Zhaoxi Chen
Haojun Jiang
Shiji Song
Yizeng Han
Gao Huang
103
100
0
07 May 2021
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
Kirill Gavrilyuk
Mihir Jain
I. Karmanov
Cees G. M. Snoek
71
21
0
04 May 2021
BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video
  Person Re-Identification
BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification
Rui Hou
Hong Chang
Bingpeng Ma
Rui Huang
Shiguang Shan
82
87
0
30 Apr 2021
Three-stream network for enriched Action Recognition
Three-stream network for enriched Action Recognition
Ivaxi Sheth
27
4
0
27 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
368
594
0
22 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
203
174
0
20 Apr 2021
HCMS: Hierarchical and Conditional Modality Selection for Efficient
  Video Recognition
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition
Zejia Weng
Zuxuan Wu
Hengduo Li
Jingjing Chen
Yu-Gang Jiang
80
4
0
20 Apr 2021
Temporal Query Networks for Fine-grained Video Understanding
Temporal Query Networks for Fine-grained Video Understanding
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
119
87
0
19 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIPVLM
489
816
0
18 Apr 2021
Adaptive Intermediate Representations for Video Understanding
Adaptive Intermediate Representations for Video Understanding
Juhana Kangaspunta
A. Piergiovanni
Rico Jonschkowski
Michael S. Ryoo
A. Angelova
51
3
0
14 Apr 2021
Video Question Answering with Phrases via Semantic Roles
Video Question Answering with Phrases via Semantic Roles
Arka Sadhu
Kan Chen
Ram Nevatia
51
16
0
08 Apr 2021
Progressive Temporal Feature Alignment Network for Video Inpainting
Progressive Temporal Feature Alignment Network for Video Inpainting
Xueyan Zou
Linjie Yang
Ding Liu
Yong Jae Lee
84
57
0
08 Apr 2021
ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal
  Action Localization
ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal Action Localization
Sanqing Qu
Guang Chen
Zhijun Li
Lijun Zhang
Fan Lu
Alois C. Knoll
102
55
0
07 Apr 2021
CCSNet: a deep learning modeling suite for CO$_2$ storage
CCSNet: a deep learning modeling suite for CO2_22​ storage
Gege Wen
C. Hay
S. Benson
85
77
0
05 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
215
1,193
0
01 Apr 2021
Adaptive Configuration of In Situ Lossy Compression for Cosmology
  Simulations via Fine-Grained Rate-Quality Modeling
Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality Modeling
Sian Jin
Jesus Pulido
Pascal Grosset
Jiannan Tian
Dingwen Tao
J. Ahrens
78
23
0
01 Apr 2021
Rethinking Self-supervised Correspondence Learning: A Video Frame-level
  Similarity Perspective
Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective
Jiarui Xu
Xiaolong Wang
VOS
194
95
0
31 Mar 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSLAI4TS
137
128
0
30 Mar 2021
Augmented Transformer with Adaptive Graph for Temporal Action Proposal
  Generation
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation
Shuning Chang
Pichao Wang
F. Wang
Hao Li
Jiashi Feng
ViT
83
42
0
30 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
242
2,175
0
29 Mar 2021
Previous
123...8910...121314
Next