ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
    3DH
ArXivPDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 650 papers shown
Title
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
37
127
0
21 Jun 2021
All You Can Embed: Natural Language based Vehicle Retrieval with
  Spatio-Temporal Transformers
All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers
Carmelo Scribano
D. Sapienza
Giorgia Franchini
M. Verucchi
Marko Bertogna
34
4
0
18 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream
  Prototypical Contrasting
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
40
11
0
18 Jun 2021
Multi-Granularity Network with Modal Attention for Dense Affective
  Understanding
Multi-Granularity Network with Modal Attention for Dense Affective Understanding
Baoming Yan
Lin Wang
Ke Gao
Bo Gao
Xiao-Chang Liu
Chao Ban
Jiang Yang
Xiaobo Li
VGen
6
1
0
18 Jun 2021
MaCLR: Motion-aware Contrastive Learning of Representations for Videos
MaCLR: Motion-aware Contrastive Learning of Representations for Videos
Fanyi Xiao
Joseph Tighe
Davide Modolo
SSL
16
13
0
17 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
27
11
0
12 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
27
274
0
09 Jun 2021
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding
  Evaluation
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Linjie Li
Jie Lei
Zhe Gan
Licheng Yu
Yen-Chun Chen
...
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
32
100
0
08 Jun 2021
Transformed ROIs for Capturing Visual Transformations in Videos
Transformed ROIs for Capturing Visual Transformations in Videos
Abhinav Rai
Fadime Sener
Angela Yao
ViT
24
3
0
06 Jun 2021
ASCNet: Self-supervised Video Representation Learning with
  Appearance-Speed Consistency
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency
Deng Huang
Wenhao Wu
Weiwen Hu
Xu Liu
Dongliang He
Zhihua Wu
Xiangmiao Wu
Ming Tan
Errui Ding
SSL
13
55
0
04 Jun 2021
CT-Net: Channel Tensorization Network for Video Classification
CT-Net: Channel Tensorization Network for Video Classification
Kunchang Li
Xianhang Li
Yali Wang
Jun Wang
Yu Qiao
ViT
30
55
0
03 Jun 2021
TSI: Temporal Saliency Integration for Video Action Recognition
TSI: Temporal Saliency Integration for Video Action Recognition
Haisheng Su
Kunchang Li
Jinyuan Feng
Dongliang Wang
Weihao Gan
Wei Wu
Yu Qiao
29
4
0
02 Jun 2021
Connecting Language and Vision for Natural Language-Based Vehicle
  Retrieval
Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
Shuai Bai
Zhedong Zheng
Xiaohan Wang
Junyang Lin
Zhu Zhang
Chang Zhou
Yi Yang
Hongxia Yang
24
27
0
31 May 2021
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
S. McCrae
Kehan Wang
A. Zakhor
36
15
0
26 May 2021
DSANet: Dynamic Segment Aggregation Network for Video-Level
  Representation Learning
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wenhao Wu
Yuxiang Zhao
Yanwu Xu
Xiao Tan
Dongliang He
...
Jinxing Ye
Yingying Li
Mingde Yao
Zichao Dong
Yifeng Shi
AI4TS
30
27
0
25 May 2021
Temporal Action Proposal Generation with Transformers
Temporal Action Proposal Generation with Transformers
Lining Wang
Haosen Yang
Wenhao Wu
H. Yao
Hujie Huang
ViT
33
27
0
25 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video
  Understanding
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
26
129
0
20 May 2021
MutualNet: Adaptive ConvNet via Mutual Learning from Different Model
  Configurations
MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations
Taojiannan Yang
Sijie Zhu
Matías Mendieta
Pu Wang
Ravikumar Balakrishnan
Minwoo Lee
T. Han
M. Shah
Cheng Chen
3DH
OOD
28
22
0
14 May 2021
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor
  Segmentation
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
Tianrui Hui
Shaofei Huang
Si Liu
Zihan Ding
Guanbin Li
Wenguan Wang
Jizhong Han
Fei Wang
20
45
0
14 May 2021
REGINA - Reasoning Graph Convolutional Networks in Human Action
  Recognition
REGINA - Reasoning Graph Convolutional Networks in Human Action Recognition
Bruno Degardin
Vasco Lopes
Hugo Proencca
3DH
GNN
38
10
0
14 May 2021
Designing Multimodal Datasets for NLP Challenges
Designing Multimodal Datasets for NLP Challenges
James Pustejovsky
E. Holderness
Jingxuan Tu
Parker Glenn
Kyeongmin Rim
Kelley Lynch
R. Brutti
23
5
0
12 May 2021
Temporal-Spatial Feature Pyramid for Video Saliency Detection
Temporal-Spatial Feature Pyramid for Video Saliency Detection
Qinyao Chang
Shiping Zhu
41
27
0
10 May 2021
Adaptive Focus for Efficient Video Recognition
Adaptive Focus for Efficient Video Recognition
Yulin Wang
Zhaoxi Chen
Haojun Jiang
Shiji Song
Yizeng Han
Gao Huang
45
98
0
07 May 2021
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
Kirill Gavrilyuk
Mihir Jain
I. Karmanov
Cees G. M. Snoek
18
21
0
04 May 2021
BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video
  Person Re-Identification
BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification
Rui Hou
Hong Chang
Bingpeng Ma
Rui Huang
Shiguang Shan
27
85
0
30 Apr 2021
Three-stream network for enriched Action Recognition
Three-stream network for enriched Action Recognition
Ivaxi Sheth
19
4
0
27 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
251
577
0
22 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
170
170
0
20 Apr 2021
HCMS: Hierarchical and Conditional Modality Selection for Efficient
  Video Recognition
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition
Zejia Weng
Zuxuan Wu
Hengduo Li
Jingjing Chen
Yu-Gang Jiang
26
4
0
20 Apr 2021
Temporal Query Networks for Fine-grained Video Understanding
Temporal Query Networks for Fine-grained Video Understanding
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
24
83
0
19 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
329
781
0
18 Apr 2021
Adaptive Intermediate Representations for Video Understanding
Adaptive Intermediate Representations for Video Understanding
Juhana Kangaspunta
A. Piergiovanni
Rico Jonschkowski
Michael S. Ryoo
A. Angelova
26
3
0
14 Apr 2021
Video Question Answering with Phrases via Semantic Roles
Video Question Answering with Phrases via Semantic Roles
Arka Sadhu
Kan Chen
Ram Nevatia
16
15
0
08 Apr 2021
Progressive Temporal Feature Alignment Network for Video Inpainting
Progressive Temporal Feature Alignment Network for Video Inpainting
Xueyan Zou
Linjie Yang
Ding Liu
Yong Jae Lee
14
56
0
08 Apr 2021
ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal
  Action Localization
ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal Action Localization
Sanqing Qu
Guang Chen
Zhijun Li
Lijun Zhang
Fan Lu
Alois C. Knoll
17
54
0
07 Apr 2021
CCSNet: a deep learning modeling suite for CO$_2$ storage
CCSNet: a deep learning modeling suite for CO2_22​ storage
Gege Wen
C. Hay
S. Benson
35
73
0
05 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
39
1,128
0
01 Apr 2021
Adaptive Configuration of In Situ Lossy Compression for Cosmology
  Simulations via Fine-Grained Rate-Quality Modeling
Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality Modeling
Sian Jin
Jesus Pulido
Pascal Grosset
Jiannan Tian
Dingwen Tao
J. Ahrens
30
22
0
01 Apr 2021
Rethinking Self-supervised Correspondence Learning: A Video Frame-level
  Similarity Perspective
Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective
Jiarui Xu
Xiaolong Wang
VOS
40
92
0
31 Mar 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
33
127
0
30 Mar 2021
Augmented Transformer with Adaptive Graph for Temporal Action Proposal
  Generation
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation
Shuning Chang
Pichao Wang
F. Wang
Hao Li
Jiashi Feng
ViT
42
41
0
30 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
30
2,088
0
29 Mar 2021
Busy-Quiet Video Disentangling for Video Classification
Busy-Quiet Video Disentangling for Video Classification
Guoxi Huang
A. Bors
28
6
0
29 Mar 2021
No frame left behind: Full Video Action Recognition
No frame left behind: Full Video Action Recognition
X. Liu
S. Pintea
F. Karimi Nejadasl
Olaf Booij
Jan van Gemert
19
40
0
29 Mar 2021
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text
  Retrieval
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
Song Liu
Haoqi Fan
Shengsheng Qian
Yiru Chen
Wenkui Ding
Zhongyuan Wang
30
145
0
28 Mar 2021
Catalyzing Clinical Diagnostic Pipelines Through Volumetric Medical
  Image Segmentation Using Deep Neural Networks: Past, Present, & Future
Catalyzing Clinical Diagnostic Pipelines Through Volumetric Medical Image Segmentation Using Deep Neural Networks: Past, Present, & Future
Teofilo E. Zosa
OOD
18
0
0
27 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
13
17
0
27 Mar 2021
Learning Comprehensive Motion Representation for Action Recognition
Learning Comprehensive Motion Representation for Action Recognition
Mingyu Wu
Boyuan Jiang
Donghao Luo
Junchi Yan
Yabiao Wang
Ying Tai
Chengjie Wang
Jilin Li
Feiyue Huang
Xiaokang Yang
16
12
0
23 Mar 2021
AdaSGN: Adapting Joint Number and Model Size for Efficient
  Skeleton-Based Action Recognition
AdaSGN: Adapting Joint Number and Model Size for Efficient Skeleton-Based Action Recognition
Lei Shi
Yifan Zhang
Jian Cheng
Hanqing Lu
30
46
0
22 Mar 2021
MoViNets: Mobile Video Networks for Efficient Video Recognition
MoViNets: Mobile Video Networks for Efficient Video Recognition
Dan Kondratyuk
Liangzhe Yuan
Yandong Li
Li Zhang
Mingxing Tan
Matthew A. Brown
Boqing Gong
13
228
0
21 Mar 2021
Previous
123...1011121389
Next