ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.11248
  4. Cited By
A Closer Look at Spatiotemporal Convolutions for Action Recognition

A Closer Look at Spatiotemporal Convolutions for Action Recognition

30 November 2017
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
ArXivPDFHTML

Papers citing "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

50 / 1,270 papers shown
Title
Text-Conditioned Resampler For Long Form Video Understanding
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar
Yongqin Xian
A. Tonioni
Andrew Zisserman
Federico Tombari
38
12
0
19 Dec 2023
Deep Learning Approaches for Seizure Video Analysis: A Review
Deep Learning Approaches for Seizure Video Analysis: A Review
David Ahmedt-Aristizabal
M. Armin
Zeeshan Hayder
Norberto Garcia-Cairasco
Lars Petersson
Clinton Fookes
Simon Denman
A. McGonigal
32
21
0
18 Dec 2023
Benchmarks for Physical Reasoning AI
Benchmarks for Physical Reasoning AI
Andrew Melnik
Robin Schiewer
Moritz Lange
Andrei Muresanu
Mozhgan Saeidi
Animesh Garg
Helge J. Ritter
29
8
0
17 Dec 2023
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model
  for Audio-Visual Speech Recognition
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Fan Yu
Haoxu Wang
Ziyang Ma
Shiliang Zhang
57
2
0
14 Dec 2023
Generative Model-based Feature Knowledge Distillation for Action
  Recognition
Generative Model-based Feature Knowledge Distillation for Action Recognition
Guiqin Wang
Peng Zhao
Yanjiang Shi
Cong Zhao
Shusen Yang
VLM
49
3
0
14 Dec 2023
ConFormer: A Novel Collection of Deep Learning Models to Assist
  Cardiologists in the Assessment of Cardiac Function
ConFormer: A Novel Collection of Deep Learning Models to Assist Cardiologists in the Assessment of Cardiac Function
Ethan Thomas
Salman Aslam
MedIm
34
0
0
13 Dec 2023
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial
  Expression Recognition in Videos
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos
Yin Chen
Jia Li
Shiguang Shan
Meng Wang
Richang Hong
48
32
0
09 Dec 2023
MuRF: Multi-Baseline Radiance Fields
MuRF: Multi-Baseline Radiance Fields
Haofei Xu
Anpei Chen
Yuedong Chen
Daniel Gehrig
Yulun Zhang
Marc Pollefeys
Andreas Geiger
Fisher Yu
18
26
0
07 Dec 2023
Low-power, Continuous Remote Behavioral Localization with Event Cameras
Low-power, Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann
Suman Ghosh
Ignacio Juarez Martinez
Tom Hart
Alex Kacelnik
Guillermo Gallego
32
7
0
06 Dec 2023
From Detection to Action Recognition: An Edge-Based Pipeline for Robot
  Human Perception
From Detection to Action Recognition: An Edge-Based Pipeline for Robot Human Perception
Petros Toupas
Georgios Tsamis
Dimitrios Giakoumis
K. Votis
Dimitrios Tzovaras
32
0
0
06 Dec 2023
D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for
  Few-shot Action Recognition
D2^22ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei
Qizhong Tan
Guangming Lu
Jiandong Tian
41
3
0
03 Dec 2023
Dancing with Still Images: Video Distillation via Static-Dynamic
  Disentanglement
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang
Yue Xu
Cewu Lu
Yong-Lu Li
DD
41
8
0
01 Dec 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
CAST: Cross-Attention in Space and Time for Video Action Recognition
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
35
12
0
30 Nov 2023
DEVIAS: Learning Disentangled Video Representations of Action and Scene
  for Holistic Video Understanding
DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding
Kyungho Bae
Geo Ahn
Youngrae Kim
Jinwoo Choi
30
3
0
30 Nov 2023
Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large
  Vision-Language Models
Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models
Dong Li
Jiandong Jin
Yuhao Zhang
Yanlin Zhong
Yaoyang Wu
Lan Chen
Tianlin Li
Bin Luo
71
6
0
30 Nov 2023
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for
  General Video Recognition
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
23
8
0
30 Nov 2023
Combined Scheduling, Memory Allocation and Tensor Replacement for
  Minimizing Off-Chip Data Accesses of DNN Accelerators
Combined Scheduling, Memory Allocation and Tensor Replacement for Minimizing Off-Chip Data Accesses of DNN Accelerators
Yi Li
Aarti Gupta
Sharad Malik
13
1
0
30 Nov 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
23
1
0
29 Nov 2023
F4D: Factorized 4D Convolutional Neural Network for Efficient
  Video-level Representation Learning
F4D: Factorized 4D Convolutional Neural Network for Efficient Video-level Representation Learning
Mohammad Al-Saad
Lakshmish Ramaswamy
S. Bhandarkar
AI4TS
24
0
0
28 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for
  Generalizable Video Action Recognition
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
33
6
0
27 Nov 2023
Introducing SSBD+ Dataset with a Convolutional Pipeline for detecting
  Self-Stimulatory Behaviours in Children using raw videos
Introducing SSBD+ Dataset with a Convolutional Pipeline for detecting Self-Stimulatory Behaviours in Children using raw videos
Vaibhavi Lokegaonkar
Vijay Jaisankar
Pon Deepika
Madhav Rao
T. Srikanth
Sarbani Mallick
Manjit Sodhi
11
1
0
25 Nov 2023
VSViG: Real-time Video-based Seizure Detection via Skeleton-based
  Spatiotemporal ViG
VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViG
Yankun Xu
Junzhe Wang
Yun-Hsuan Chen
Jie Yang
Wenjie Ming
Shuangquan Wang
Mohamad Sawan
17
0
0
24 Nov 2023
Modality Mixer Exploiting Complementary Information for Multi-modal
  Action Recognition
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Muhammad Adi Nugroho
Changick Kim
30
0
0
21 Nov 2023
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video
  Parsing
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
Yating Xu
Conghui Hu
Gim Hee Lee
22
2
0
14 Nov 2023
ELF: An End-to-end Local and Global Multimodal Fusion Framework for
  Glaucoma Grading
ELF: An End-to-end Local and Global Multimodal Fusion Framework for Glaucoma Grading
Wenyun Li
Chi-Man Pun
14
1
0
14 Nov 2023
INCODE: Implicit Neural Conditioning with Prior Knowledge Embeddings
INCODE: Implicit Neural Conditioning with Prior Knowledge Embeddings
A. Kazerouni
Reza Azad
Alireza Hosseini
Dorit Merhof
Ulas Bagci
AI4CE
30
15
0
28 Oct 2023
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Diversifying Spatial-Temporal Perception for Video Domain Generalization
Kun-Yu Lin
Jia-Run Du
Yipeng Gao
Jiaming Zhou
Wei-Shi Zheng
45
14
0
27 Oct 2023
Deepfake Detection: Leveraging the Power of 2D and 3D CNN Ensembles
Deepfake Detection: Leveraging the Power of 2D and 3D CNN Ensembles
Aagam Bakliwal
Amit D. Joshi
19
1
0
25 Oct 2023
Remote Heart Rate Monitoring in Smart Environments from Videos with
  Self-supervised Pre-training
Remote Heart Rate Monitoring in Smart Environments from Videos with Self-supervised Pre-training
Divij Gupta
Ali Etemad
47
2
0
23 Oct 2023
3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for
  Embodied Turn-Taking Prediction
3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking Prediction
Mehdi Fatan
Emanuele Mincato
Dimitra Pintzou
Mariella Dimiccoli
30
1
0
23 Oct 2023
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized
  Self-Attention for Human Activity Recognition
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition
Rachid Reda Dokkar
F. Chaieb
Hassen Drira
Arezki Aberkane
ViT
30
2
0
22 Oct 2023
On the Relevance of Temporal Features for Medical Ultrasound Video
  Recognition
On the Relevance of Temporal Features for Medical Ultrasound Video Recognition
D. H. Smith
J. P. Lineberger
G. H. Baker
8
2
0
16 Oct 2023
CM-PIE: Cross-modal perception for interactive-enhanced audio-visual
  video parsing
CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing
Yaru Chen
Ruohao Guo
Xubo Liu
Peipei Wu
Guangyao Li
Zhenbo Li
Wenwu Wang
34
7
0
11 Oct 2023
Boundary Discretization and Reliable Classification Network for Temporal
  Action Detection
Boundary Discretization and Reliable Classification Network for Temporal Action Detection
Zhenying Fang
Jun Yu
Richang Hong
28
0
0
10 Oct 2023
Automatic nodule identification and differentiation in ultrasound videos
  to facilitate per-nodule examination
Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination
Siyuan Jiang
Yan Ding
Yuling Wang
Lei Xu
Wenli Dai
...
Jie Yu
Jianqiao Zhou
Chunquan Zhang
Ping Liang
Dexing Kong
19
0
0
10 Oct 2023
Semantic-aware Temporal Channel-wise Attention for Cardiac Function
  Assessment
Semantic-aware Temporal Channel-wise Attention for Cardiac Function Assessment
Guanqi Chen
Guanbin Li
11
0
0
09 Oct 2023
In the Blink of an Eye: Event-based Emotion Recognition
In the Blink of an Eye: Event-based Emotion Recognition
Haiwei Zhang
Jiqing Zhang
B. Dong
Pieter Peers
Wenwei Wu
Xiaopeng Wei
Felix Heide
Xin Yang
CVBM
32
12
0
06 Oct 2023
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action
  Localization
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization
Edward Fish
Jon Weinbren
Andrew Gilbert
36
0
0
05 Oct 2023
FashionFlow: Leveraging Diffusion Models for Dynamic Fashion Video
  Synthesis from Static Imagery
FashionFlow: Leveraging Diffusion Models for Dynamic Fashion Video Synthesis from Static Imagery
Tasin Islam
A. Miron
Xiaohui Liu
Yongmin Li
DiffM
31
3
0
29 Sep 2023
A Survey on Deep Learning Techniques for Action Anticipation
A Survey on Deep Learning Techniques for Action Anticipation
Zeyun Zhong
Manuel Martin
Michael Voit
Juergen Gall
Jürgen Beyerer
26
7
0
29 Sep 2023
End-to-End Streaming Video Temporal Action Segmentation with Reinforce
  Learning
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning
Jinrong Zhang
Wu Wen
Sheng-lan Liu
Yunheng Li
Qifeng Li
Lin Feng
31
0
0
27 Sep 2023
Egocentric RGB+Depth Action Recognition in Industry-Like Settings
Egocentric RGB+Depth Action Recognition in Industry-Like Settings
Jyoti Kini
Sarah Fleischer
I. Dave
Mubarak Shah
EgoV
31
2
0
25 Sep 2023
Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training
Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training
Jiangliu Wang
Jianbo Jiao
Yibing Song
Stephen James
Zhan Tong
Chongjian Ge
Pieter Abbeel
Yunhui Liu
20
0
0
25 Sep 2023
S3TC: Spiking Separated Spatial and Temporal Convolutions with
  Unsupervised STDP-based Learning for Action Recognition
S3TC: Spiking Separated Spatial and Temporal Convolutions with Unsupervised STDP-based Learning for Action Recognition
Mireille el Assal
Pierre Tirilly
Ioan Marius Bilasco
26
2
0
22 Sep 2023
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event
  Classification
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
Meng Liu
K. Liang
Dayu Hu
Hao Yu
Yue Liu
Lingyuan Meng
Wenxuan Tu
Sihang Zhou
Xinwang Liu
18
25
0
21 Sep 2023
Selective Volume Mixup for Video Action Recognition
Selective Volume Mixup for Video Action Recognition
Yi Tan
Zhaofan Qiu
Y. Hao
Ting Yao
Xiangnan He
Tao Mei
ViT
35
2
0
18 Sep 2023
A Real-Time Active Speaker Detection System Integrating an Audio-Visual
  Signal with a Spatial Querying Mechanism
A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism
I. Gurvich
Ido Leichter
Dharmendar Reddy Palle
Yossi Asher
Alon Vinnikov
Igor Abramovski
Vishak Gopal
Ross Cutler
Eyal Krupka
34
4
0
15 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
  Transfer Learning
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
27
18
0
14 Sep 2023
TransNet: A Transfer Learning-Based Network for Human Action Recognition
TransNet: A Transfer Learning-Based Network for Human Action Recognition
Khaled Alomar
Xiaohao Cai
38
1
0
13 Sep 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Palaash Agrawal
Haidi Azaman
Cheston Tan
51
3
0
13 Sep 2023
Previous
123456...242526
Next