Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 657 papers shown
Title
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization
Xiaoqi Wang
Yi Wang
Lap-Pui Chau
30
0
0
17 Jun 2025
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models
Tung M. Luu
Younghwan Lee
Donghoon Lee
Sunho Kim
Min Jun Kim
Chang D. Yoo
ALM
VLM
23
0
0
15 Jun 2025
An Effective End-to-End Solution for Multimodal Action Recognition
Songping Wang
Xiantao Hu
Yueming Lyu
Caifeng Shan
70
0
0
11 Jun 2025
Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation
Edward Fish
Richard Bowden
SLR
27
1
0
30 May 2025
Unsupervised Transcript-assisted Video Summarization and Highlight Detection
Spyros Barbakos
Charalampos Antoniadis
Gerasimos Potamianos
Gianluca Setti
OffRL
AI4TS
137
0
0
29 May 2025
CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
25
0
0
26 May 2025
Advancing Video Self-Supervised Learning via Image Foundation Models
Jingwei Wu
Zhewei Huang
Chang Liu
44
0
0
25 May 2025
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
Jiahui Zhang
Yusen Luo
Abrar Anwar
Sumedh Anand Sontakke
Joseph J Lim
Jesse Thomason
Erdem Biyik
Jesse Zhang
OffRL
LM&Ro
126
0
0
16 May 2025
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Meng Cui
Xianghu Yue
Xinyuan Qian
Jinzheng Zhao
Haohe Liu
Xubo Liu
Daoliang Li
Wenwu Wang
136
0
0
21 Apr 2025
Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction
Li Yu
Xuanzhe Sun
Wei Zhou
Moncef Gabbouj
DiffM
71
0
0
19 Apr 2025
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak N. Araabi
474
0
0
14 Apr 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
116
0
0
17 Mar 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
89
0
0
11 Feb 2025
Imitation Learning from a Single Temporally Misaligned Video
William Huey
Huaxiaoyue Wang
Anne Wu
Yoav Artzi
Sanjiban Choudhury
AI4TS
99
0
0
08 Feb 2025
BILLNET: A Binarized Conv3D-LSTM Network with Logic-gated residual architecture for hardware-efficient video inference
Van Thien Nguyen
William Guicquero
Gilles Sicard
3DV
MQ
145
2
0
24 Jan 2025
WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance
Phillip Maire
Samson G. King
Jonathan Andrew Cheung
Stefanie Walker
Samuel Andrew Hires
194
0
0
06 Jan 2025
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
88
5
0
31 Dec 2024
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
Yulin Wang
Haoji Zhang
Yang Yue
Shiji Song
Chao Deng
Junlan Feng
Gao Huang
123
4
0
15 Dec 2024
Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression
Ali Mollaahmadi Dehaghi
Reza Razavi
Mohammad Moshirpour
123
1
0
12 Dec 2024
Relevance-guided Audio Visual Fusion for Video Saliency Prediction
Li Yu
Xuanzhe Sun
Pan Gao
Moncef Gabbouj
130
1
0
18 Nov 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
73
0
0
04 Nov 2024
MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language Recognition Dataset
Xin Shen
Heming Du
Hongwei Sheng
Shuyun Wang
Hui Chen
...
Xiaobiao Du
Jiaying Ying
Ruihan Lu
Qingzheng Xu
Xin Yu
SLR
59
7
0
25 Oct 2024
GenAI Assisting Medical Training
Stefan Gerd Fritsch
Matthias Tschoepe
Vitor Fortes Rey
Lars Krupp
Agnes Gruenerbl
Eloise Monger
Sarah Travenna
31
0
0
21 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
173
1
0
09 Oct 2024
Grounding is All You Need? Dual Temporal Grounding for Video Dialog
You Qin
Wei Ji
Xinze Lan
Hao Fei
Xun Yang
Dan Guo
Roger Zimmermann
Lizi Liao
VGen
83
0
0
08 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
Michael R. Lyu
Liwei Wang
VLM
48
0
0
08 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Joey Tianyi Zhou
Koustuv Sinha
AI4TS
110
5
0
04 Oct 2024
REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke
Wiktor Mucha
Kentaro Tanaka
M. Kampel
82
0
0
30 Sep 2024
Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
VGen
81
13
0
20 Sep 2024
High-Order Evolving Graphs for Enhanced Representation of Traffic Dynamics
Aditya Humnabadkar
Arindam Sikdar
Benjamin Cave
Huaizhong Zhang
P. Bakaki
Ardhendu Behera
95
0
0
17 Sep 2024
KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance
Jingxian Lu
Wenke Xia
Dong Wang
Zhigang Wang
Bin Zhao
Di Hu
Xuelong Li
80
3
0
06 Aug 2024
Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?
Habib Hajimolahoseini
Walid Ahmed
Austin Wen
Yang Liu
71
0
0
23 Jul 2024
Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective
Changwen Zheng
Wenwen Qiang
Jianqi Zhang
Changwen Zheng
Jingyao Wang
SSL
111
0
0
19 Jul 2024
Pose-guided multi-task video transformer for driver action recognition
Ricardo Pizarro
Roberto Valle
L. Bergasa
J. M. Buenaposada
Luis Baumela
ViT
68
0
0
18 Jul 2024
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffM
VGen
95
16
0
15 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
124
9
0
11 Jul 2024
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li
Zhenhua Feng
Tianyang Xu
Linze Li
Xiao-Jun Wu
Muhammad Awais
Sara Atito
Josef Kittler
CoGe
86
6
0
08 Jul 2024
Open-Event Procedure Planning in Instructional Videos
Yilu Wu
Hanlin Wang
Jing Wang
Limin Wang
93
1
0
06 Jul 2024
IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale
Wei Gao
Bo Ai
Joel Loo
Vinay
David Hsu
125
1
0
03 Jul 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
76
5
0
03 Jul 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
94
4
0
21 Jun 2024
PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification
Magdalena Trędowicz
Łukasz Struski
Marcin Mazur
Szymon Janusz
Arkadiusz Lewicki
Jacek Tabor
79
1
0
17 Jun 2024
Video Frame Interpolation for Polarization via Swin-Transformer
Feng Huang
Xin Zhang
Yixuan Xu
Xuesong Wang
Xianyu Wu
75
0
0
17 Jun 2024
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition
Weichao Zhao
Wengang Zhou
Hezhen Hu
Min Wang
Houqiang Li
SLR
100
3
0
15 Jun 2024
MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition
Stefan Gerd Fritsch
Cennet Oğuz
Vitor Fortes Rey
L. Ray
Maximilian Kiefer-Emmanouilidis
Paul Lukowicz
HAI
116
0
0
06 Jun 2024
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model
Khaled Alomar
Halil Ibrahim Aysel
Xiaohao Cai
MedIm
ViT
83
9
0
02 Jun 2024
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition
Weichao Zhao
Hezhen Hu
Wen-gang Zhou
Yunyao Mao
Min Wang
Houqiang Li
SLR
83
10
0
31 May 2024
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Minttu Alakuijala
Reginald McLean
Isaac Woungang
Nariman Farsad
Samuel Kaski
Pekka Marttinen
Kai Yuan
LM&Ro
66
1
0
30 May 2024
Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks
Mohit Prabhushankar
Ghassan AlRegib
UQCV
77
0
0
22 May 2024
Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance
Kaifeng Zhang
Zhao-Heng Yin
Weirui Ye
Yang Gao
153
4
0
22 May 2024
1
2
3
4
...
12
13
14
Next