ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.04851
  4. Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Z. Tu
Kevin Patrick Murphy
    3DH
ArXivPDFHTML

Papers citing "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"

50 / 650 papers shown
Title
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Meng Cui
Xianghu Yue
Xinyuan Qian
Jinzheng Zhao
Haohe Liu
Xubo Liu
Daoliang Li
Wenwu Wang
31
0
0
21 Apr 2025
Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction
Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction
Li Yu
Xuanzhe Sun
Wei Zhou
Moncef Gabbouj
DiffM
29
0
0
19 Apr 2025
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak Nadjar Araabi
143
0
0
14 Apr 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
59
0
0
17 Mar 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
45
0
0
11 Feb 2025
Imitation Learning from a Single Temporally Misaligned Video
William Huey
Huaxiaoyue Wang
Anne Wu
Yoav Artzi
Sanjiban Choudhury
AI4TS
60
0
0
08 Feb 2025
BILLNET: A Binarized Conv3D-LSTM Network with Logic-gated residual architecture for hardware-efficient video inference
BILLNET: A Binarized Conv3D-LSTM Network with Logic-gated residual architecture for hardware-efficient video inference
Van Thien Nguyen
William Guicquero
Gilles Sicard
3DV
MQ
79
2
0
24 Jan 2025
WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance
WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance
Phillip Maire
Samson G. King
Jonathan Andrew Cheung
Stefanie Walker
Samuel Andrew Hires
46
0
0
06 Jan 2025
GFG -- Gender-Fair Generation: A CALAMITA Challenge
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
45
1
0
31 Dec 2024
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
Yulin Wang
Haoji Zhang
Yang Yue
Shiji Song
Chao Deng
Junlan Feng
Gao Huang
79
3
0
15 Dec 2024
Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K
  Video Restoration under Codec Compression
Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression
Ali Mollaahmadi Dehaghi
Reza Razavi
Mohammad Moshirpour
77
1
0
12 Dec 2024
Relevance-guided Audio Visual Fusion for Video Saliency Prediction
Li Yu
Xuanzhe Sun
Pan Gao
Moncef Gabbouj
68
1
0
18 Nov 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
39
0
0
04 Nov 2024
MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language
  Recognition Dataset
MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language Recognition Dataset
Xin Shen
Heming Du
Hongwei Sheng
Shuyun Wang
Hui Chen
...
Xiaobiao Du
Jiaying Ying
Ruihan Lu
Qingzheng Xu
Xin Yu
SLR
36
3
0
25 Oct 2024
GenAI Assisting Medical Training
GenAI Assisting Medical Training
Stefan Gerd Fritsch
Matthias Tschoepe
Vitor Fortes Rey
Lars Krupp
Agnes Gruenerbl
Eloise Monger
Sarah Travenna
16
0
0
21 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
87
1
0
09 Oct 2024
Grounding is All You Need? Dual Temporal Grounding for Video Dialog
Grounding is All You Need? Dual Temporal Grounding for Video Dialog
You Qin
Wei Ji
Xinze Lan
Hao Fei
Xun Yang
Dan Guo
Roger Zimmermann
Lizi Liao
VGen
41
0
0
08 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Enhancing Temporal Modeling of Video LLMs via Time Gating
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
M. Lyu
Liwei Wang
VLM
33
0
0
08 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video
  Representation Learning
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Joey Tianyi Zhou
Koustuv Sinha
AI4TS
57
3
0
04 Oct 2024
REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for
  Treatment of Hands after Surviving Stroke
REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke
Wiktor Mucha
Kentaro Tanaka
M. Kampel
42
0
0
30 Sep 2024
Temporally Aligned Audio for Video with Autoregression
Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
VGen
50
11
0
20 Sep 2024
High-Order Evolving Graphs for Enhanced Representation of Traffic
  Dynamics
High-Order Evolving Graphs for Enhanced Representation of Traffic Dynamics
Aditya Humnabadkar
Arindam Sikdar
Benjamin Cave
Huaizhong Zhang
P. Bakaki
Ardhendu Behera
38
0
0
17 Sep 2024
KOI: Accelerating Online Imitation Learning via Hybrid Key-state
  Guidance
KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance
Jingxian Lu
Wenke Xia
Dong Wang
Zhigang Wang
Bin Zhao
Di Hu
Xuelong Li
46
3
0
06 Aug 2024
Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?
Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?
Habib Hajimolahoseini
Walid Ahmed
Austin Wen
Yang Liu
29
0
0
23 Jul 2024
Self-Supervised Video Representation Learning in a Heuristic Decoupled
  Perspective
Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective
Changwen Zheng
Wenwen Qiang
Jianqi Zhang
Changwen Zheng
Jingyao Wang
SSL
66
0
0
19 Jul 2024
Pose-guided multi-task video transformer for driver action recognition
Pose-guided multi-task video transformer for driver action recognition
Ricardo Pizarro
Roberto Valle
L. Bergasa
J. M. Buenaposada
Luis Baumela
ViT
40
0
0
18 Jul 2024
Masked Generative Video-to-Audio Transformers with Enhanced
  Synchronicity
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffM
VGen
47
13
0
15 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
42
7
0
11 Jul 2024
C2C: Component-to-Composition Learning for Zero-Shot Compositional
  Action Recognition
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li
Zhenhua Feng
Tianyang Xu
Linze Li
Xiao-Jun Wu
Muhammad Awais
Sara Atito
Josef Kittler
CoGe
60
5
0
08 Jul 2024
Open-Event Procedure Planning in Instructional Videos
Open-Event Procedure Planning in Instructional Videos
Yilu Wu
Hanlin Wang
Jing Wang
Limin Wang
57
0
0
06 Jul 2024
IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale
IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale
Wei Gao
Bo Ai
Joel Loo
Vinay
David Hsu
49
1
0
03 Jul 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for
  Efficient Video Recognition
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
40
4
0
03 Jul 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video
  Action Recognition
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
54
4
0
21 Jun 2024
PrAViC: Probabilistic Adaptation Framework for Real-Time Video
  Classification
PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification
Magdalena Trędowicz
Łukasz Struski
Marcin Mazur
Szymon Janusz
Arkadiusz Lewicki
Jacek Tabor
28
1
0
17 Jun 2024
Video Frame Interpolation for Polarization via Swin-Transformer
Video Frame Interpolation for Polarization via Swin-Transformer
Feng Huang
Xin Zhang
Yixuan Xu
Xuesong Wang
Xianyu Wu
29
0
0
17 Jun 2024
Self-Supervised Representation Learning with Spatial-Temporal
  Consistency for Sign Language Recognition
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition
Weichao Zhao
Wengang Zhou
Hezhen Hu
Min Wang
Houqiang Li
SLR
37
2
0
15 Jun 2024
MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition
MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition
Stefan Gerd Fritsch
Cennet Oğuz
Vitor Fortes Rey
L. Ray
Maximilian Kiefer-Emmanouilidis
Paul Lukowicz
HAI
53
0
0
06 Jun 2024
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a
  Hybrid Model
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model
Khaled Alomar
Halil Ibrahim Aysel
Xiaohao Cai
MedIm
ViT
43
7
0
02 Jun 2024
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign
  Language Recognition
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition
Weichao Zhao
Hezhen Hu
Wen-gang Zhou
Yunyao Mao
Min Wang
Houqiang Li
SLR
44
8
0
31 May 2024
Video-Language Critic: Transferable Reward Functions for
  Language-Conditioned Robotics
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Minttu Alakuijala
Reginald McLean
Isaac Woungang
Nariman Farsad
Samuel Kaski
Pekka Marttinen
Kai Yuan
LM&Ro
42
1
0
30 May 2024
Counterfactual Gradients-based Quantification of Prediction Trust in
  Neural Networks
Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks
Mohit Prabhushankar
Ghassan AlRegib
UQCV
29
0
0
22 May 2024
Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance
Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance
Kaifeng Zhang
Zhao-Heng Yin
Weirui Ye
Yang Gao
70
3
0
22 May 2024
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Rong Gao
Xin Liu
Bohao Xing
Zitong Yu
Björn W. Schuller
Heikki Kälviäinen
57
3
0
21 May 2024
No Time to Waste: Squeeze Time into Channel for Mobile Video
  Understanding
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Yingjie Zhai
Wenshuo Li
Yehui Tang
Xinghao Chen
Yunhe Wang
ViT
30
0
0
14 May 2024
DiffGen: Robot Demonstration Generation via Differentiable Physics
  Simulation, Differentiable Rendering, and Vision-Language Model
DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model
Yang Jin
Jun Lv
Shuqiang Jiang
Cewu Lu
48
1
0
12 May 2024
Deep video representation learning: a survey
Deep video representation learning: a survey
Elham Ravanbakhsh
Yongqing Liang
J. Ramanujam
Xin Li
49
3
0
10 May 2024
Multi-Stream Keypoint Attention Network for Sign Language Recognition
  and Translation
Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation
Mo Guan
Yan Wang
Guangkun Ma
Jiarui Liu
Mingzu Sun
SLR
43
6
0
09 May 2024
A Survey on Backbones for Deep Video Action Recognition
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
38
1
0
09 May 2024
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global
  Temporal Defect Based Detection Method
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method
Peisong He
Leyao Zhu
Jiaxing Li
Shiqi Wang
Haoliang Li
EGVM
23
2
0
07 May 2024
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News
Zhe Niu
Ronglai Zuo
Brian Mak
Fangyun Wei
21
5
0
02 May 2024
1234...111213
Next