ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Z. Tu
Kevin Patrick Murphy
    3DH
ArXivPDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 650 papers shown
Title
A Large-scale Study of Spatiotemporal Representation Learning with a New
  Benchmark on Action Recognition
A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition
Andong Deng
Taojiannan Yang
Chong Chen
AI4TS
27
13
0
23 Mar 2023
Natural Language-Assisted Sign Language Recognition
Natural Language-Assisted Sign Language Recognition
Ronglai Zuo
Fangyun Wei
Brian Mak
SLR
23
38
0
21 Mar 2023
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Fida Mohammad Thoker
Hazel Doughty
Cees G. M. Snoek
ViT
48
9
0
20 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Dual-path Adaptation from Image to Video Transformers
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
ViT
21
37
0
17 Mar 2023
Video Action Recognition with Attentive Semantic Units
Video Action Recognition with Attentive Semantic Units
Yifei Chen
Dapeng Chen
Ruijin Liu
Hao Li
Wei Peng
21
11
0
17 Mar 2023
CASP-Net: Rethinking Video Saliency Prediction from an
  Audio-VisualConsistency Perceptual Perspective
CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
Jun Xiong
Gang Wang
Peng Zhang
Wei Huang
Yufei Zha
Guangtao Zhai
29
14
0
11 Mar 2023
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test
  Questions
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test Questions
He Zhu
Xihua Li
Xuemin Zhao
Yunbo Cao
Shan Yu
15
0
0
09 Mar 2023
Improving Video Retrieval by Adaptive Margin
Improving Video Retrieval by Adaptive Margin
Feng He
Qi Wang
Zhifan Feng
Wenbin Jiang
Yajuan Lü
Yong Zhu
Xiao Tan
88
20
0
09 Mar 2023
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
23
25
0
09 Mar 2023
Continuity-Aware Latent Interframe Information Mining for Reliable UAV
  Tracking
Continuity-Aware Latent Interframe Information Mining for Reliable UAV Tracking
Changhong Fu
Mutian Cai
Sihang Li
Kunhan Lu
Haobo Zuo
Chongjun Liu
22
5
0
08 Mar 2023
Continuous Sign Language Recognition with Correlation Network
Continuous Sign Language Recognition with Correlation Network
Lianyu Hu
Liqing Gao
Zekang Liu
Wei Feng
SLR
40
58
0
06 Mar 2023
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video
  Recognition
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition
Junyan Wang
Zhenhong Sun
Yichen Qian
Dong Gong
Xiuyu Sun
Ming Lin
M. Pagnucco
Yang Song
3DPC
20
11
0
05 Mar 2023
Temporal Coherent Test-Time Optimization for Robust Video Classification
Temporal Coherent Test-Time Optimization for Robust Video Classification
Chenyu Yi
Siyuan Yang
Yufei Wang
Haoliang Li
Yap-Peng Tan
Alex C. Kot
TTA
27
12
0
28 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
Contrastive Video Question Answering via Video Graph Transformer
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
27
35
0
27 Feb 2023
Deep Learning for Video-Text Retrieval: a Review
Deep Learning for Video-Text Retrieval: a Review
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
24
14
0
24 Feb 2023
STOA-VLP: Spatial-Temporal Modeling of Object and Action for
  Video-Language Pre-training
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training
Weihong Zhong
Mao Zheng
Duyu Tang
Xuan Luo
Heng Gong
Xiaocheng Feng
Bing Qin
32
8
0
20 Feb 2023
Video Action Recognition Collaborative Learning with Dynamics via
  PSO-ConvNet Transformer
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer
N. H. Phong
B. Ribeiro
29
15
0
17 Feb 2023
CholecTriplet2022: Show me a tool and tell me the triplet -- an
  endoscopic vision challenge for surgical action triplet detection
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
C. Nwoye
Tong Yu
Saurav Sharma
Aditya Murali
Deepak Alapatt
...
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
Didier Mutter
N. Padoy
32
17
0
13 Feb 2023
Efficient End-to-End Video Question Answering with Pyramidal Multimodal
  Transformer
Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer
Min Peng
Chongyang Wang
Yu Shi
Xiang-Dong Zhou
ViT
50
7
0
04 Feb 2023
Learning Large-scale Neural Fields via Context Pruned Meta-Learning
Learning Large-scale Neural Fields via Context Pruned Meta-Learning
Jihoon Tack
Subin Kim
Sihyun Yu
Jaeho Lee
Jinwoo Shin
Jonathan Richard Schwarz
32
9
0
01 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text
  Retrieval
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
Yizhen Chen
Jie Wang
Lijian Lin
Zhongang Qi
Jin Ma
Ying Shan
VLM
30
18
0
30 Jan 2023
Semi-Parametric Video-Grounded Text Generation
Semi-Parametric Video-Grounded Text Generation
Sungdong Kim
Jin-Hwa Kim
Jiyoung Lee
Minjoon Seo
VGen
32
14
0
27 Jan 2023
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge
  Transferring
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
Ruyang Liu
Jingjia Huang
Ge Li
Jiashi Feng
Xing Wu
Thomas H. Li
AI4TS
CLIP
VLM
32
47
0
26 Jan 2023
Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using
  a New Frame Selection Policy and Gating Mechanism
Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using a New Frame Selection Policy and Gating Mechanism
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
21
4
0
18 Jan 2023
Building Scalable Video Understanding Benchmarks through Sports
Building Scalable Video Understanding Benchmarks through Sports
Aniket Agarwal
Alex Zhang
Karthik Narasimhan
Igor Gilitschenski
Vishvak Murahari
Yash Kant
19
1
0
17 Jan 2023
TinyHD: Efficient Video Saliency Prediction with Heterogeneous Decoders
  using Hierarchical Maps Distillation
TinyHD: Efficient Video Saliency Prediction with Heterogeneous Decoders using Hierarchical Maps Distillation
Feiyan Hu
S. Palazzo
Federica Proietto Salanitri
Giovanni Bellitto
Morteza Moradi
C. Spampinato
Kevin McGuinness
18
9
0
11 Jan 2023
Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification
  Dataset using Manipulating Conditional Style Translation
Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification Dataset using Manipulating Conditional Style Translation
Hilmil Pradana
Minh-Son Dao
K. Zettsu
29
5
0
06 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
22
53
0
05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in
  Instructional Videos
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
24
4
0
05 Jan 2023
Test of Time: Instilling Video-Language Models with a Sense of Time
Test of Time: Instilling Video-Language Models with a Sense of Time
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
86
36
0
05 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action
  Recognition
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Guohao Li
AAML
32
8
0
03 Jan 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition
  with Pre-trained Vision-Language Models
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
100
48
0
31 Dec 2022
An end-to-end multi-scale network for action prediction in videos
An end-to-end multi-scale network for action prediction in videos
Xiaofan Liu
Jianqin Yin
Yuanxi Sun
Zhicheng Zhang
Jin Tang
21
0
0
31 Dec 2022
StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language
  Recognition
StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition
Xi Shen
Zhedong Zheng
Yi Yang
SLR
30
13
0
25 Dec 2022
Similarity Contrastive Estimation for Image and Video Soft Contrastive
  Self-Supervised Learning
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
J. Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
SSL
19
6
0
21 Dec 2022
MoQuad: Motion-focused Quadruple Construction for Video Contrastive
  Learning
MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning
Yuan Liu
Jiacheng Chen
Hao Wu
30
2
0
21 Dec 2022
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
Sangwon Kim
Dasom Ahn
ByoungChul Ko
ViT
3DPC
38
24
0
12 Dec 2022
VindLU: A Recipe for Effective Video-and-Language Pretraining
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
35
79
0
09 Dec 2022
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive
  Captioners
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
32
46
0
09 Dec 2022
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene
  Segmentation
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
27
2
0
09 Dec 2022
DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera
  Based Activity Recognition
DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition
Santosh Kumar Yadav
Achleshwar Luthra
Esha Pahwa
K. Tiwari
Heena Rathore
Hari Mohan Pandey
Peter Corcoran
34
12
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
36
54
0
06 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
39
0
0
03 Dec 2022
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu
Biaolong Chen
Yue Liao
Shuwen Xiao
Wenyu Sun
Xiaobo Li
Yousong Zhu
Jinqiao Wang
Si Liu
CLIP
25
11
0
02 Dec 2022
Query Efficient Cross-Dataset Transferable Black-Box Attack on Action
  Recognition
Query Efficient Cross-Dataset Transferable Black-Box Attack on Action Recognition
Rohit Gupta
Naveed Akhtar
Gaurav Kumar Nayak
Ajmal Mian
M. Shah
AAML
34
1
0
23 Nov 2022
Dynamic Appearance: A Video Representation for Action Recognition with
  Joint Training
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
27
1
0
23 Nov 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative
  Latent Attention
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
24
9
0
21 Nov 2022
Generalizable Deepfake Detection with Phase-Based Motion Analysis
Generalizable Deepfake Detection with Phase-Based Motion Analysis
Ekta Prashnani
Michael Goebel
B. S. Manjunath
35
4
0
17 Nov 2022
Dynamic Temporal Filtering in Video Models
Dynamic Temporal Filtering in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Chong-Wah Ngo
Tao Mei
AI4TS
24
17
0
15 Nov 2022
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity
  Recognition
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
H. Haresamudram
Irfan Essa
20
5
0
08 Nov 2022
Previous
12345...111213
Next