ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.11248
  4. Cited By
A Closer Look at Spatiotemporal Convolutions for Action Recognition

A Closer Look at Spatiotemporal Convolutions for Action Recognition

30 November 2017
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
ArXivPDFHTML

Papers citing "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

50 / 1,270 papers shown
Title
Point-Voxel Absorbing Graph Representation Learning for Event Stream
  based Recognition
Point-Voxel Absorbing Graph Representation Learning for Event Stream based Recognition
Bowei Jiang
Chengguo Yuan
Tianlin Li
Zhimin Bao
Lin Zhu
Yonghong Tian
Bin Luo
GNN
3DPC
26
4
0
08 Jun 2023
Atrial Septal Defect Detection in Children Based on Ultrasound Video
  Using Multiple Instances Learning
Atrial Septal Defect Detection in Children Based on Ultrasound Video Using Multiple Instances Learning
Yiman Liu
Qingming Huang
Xiaoxiang Han
Tongtong Liang
Zhi-fang Zhang
...
Angelos Stefanidis
Jionglong Su
Jiangang Chen
Qingli Li
Yuqi Zhang
25
7
0
06 Jun 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language
  Perspective
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Yingying Fan
Yu Wu
Bo Du
Yutian Lin
34
8
0
01 Jun 2023
fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for
  HAR onto FPGAs
fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs
Petros Toupas
C. Bouganis
Dimitrios Tzovaras
19
3
0
31 May 2023
VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning
  Challenges
VIPriors 3: Visual Inductive Priors for Data-Efficient Deep Learning Challenges
Robert-Jan Bruintjes
A. Lengyel
Marcos Baptista-Rios
O. Kayhan
Davide Zambrano
Nergis Tomen
Jan van Gemert
25
9
0
31 May 2023
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using
  Spatial Transformer Networks
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks
L. Tóth
Amin Honarmandi Shandiz
G. Gosztolya
T. Csapó
24
3
0
30 May 2023
FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action
  Recognition
FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition
Petros Toupas
C. Bouganis
Dimitrios Tzovaras
10
4
0
29 May 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event
  Parser
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
26
10
0
27 May 2023
Detecting Heart Disease from Multi-View Ultrasound Images via Supervised
  Attention Multiple Instance Learning
Detecting Heart Disease from Multi-View Ultrasound Images via Supervised Attention Multiple Instance Learning
Zhe Huang
B. Wessler
M. C. Hughes
30
4
0
25 May 2023
Cross-view Action Recognition Understanding From Exocentric to
  Egocentric Perspective
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Thanh-Dat Truong
Khoa Luu
EgoV
41
10
0
25 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
103
77
0
22 May 2023
HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with
  Cross-person Memory Transformer
HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer
Y. Kim
Dong Won Lee
Paul Pu Liang
Sharifa Alghowinem
C. Breazeal
Hae Won Park
37
4
0
21 May 2023
Lightweight Delivery Detection on Doorbell Cameras
Lightweight Delivery Detection on Doorbell Cameras
Pirazh Khorramshahi
Zhe Wu
Tianchen Wang
Luke Deluccia
Hongcheng Wang
14
0
0
13 May 2023
MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
Xinyu Gong
S. Mohan
Naina Dhingra
Jean-Charles Bazin
Yilei Li
Zhangyang Wang
Rakesh Ranjan
EgoV
56
18
0
12 May 2023
LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial
  Expression Recognition
LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition
Fuyan Ma
Bin Sun
Shutao Li
ViT
27
20
0
05 May 2023
ItoV: Efficiently Adapting Deep Learning-based Image Watermarking to
  Video Watermarking
ItoV: Efficiently Adapting Deep Learning-based Image Watermarking to Video Watermarking
Guanhui Ye
Jiashi Gao
Yuchen Wang
Liyan Song
Xue-Ming Wei
35
3
0
04 May 2023
SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow
  Estimation
SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow Estimation
Fisseha Admasu Ferede
M. Balasubramanian
24
3
0
26 Apr 2023
Implicit Temporal Modeling with Learnable Alignment for Video
  Recognition
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
33
35
0
20 Apr 2023
A baseline on continual learning methods for video action recognition
A baseline on continual learning methods for video action recognition
Giulia Castagnolo
C. Spampinato
Francesco Rundo
Daniela Giordano
S. Palazzo
CLL
32
2
0
20 Apr 2023
Multimodal Group Activity Dataset for Classroom Engagement Level
  Prediction
Multimodal Group Activity Dataset for Classroom Engagement Level Prediction
Alpay Sabuncuoglu
T. Metin Sezgin
11
3
0
18 Apr 2023
SViTT: Temporal Learning of Sparse Video-Text Transformers
SViTT: Temporal Learning of Sparse Video-Text Transformers
Yi Li
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
31
12
0
18 Apr 2023
Conditional Generation of Audio from Video via Foley Analogies
Conditional Generation of Audio from Video via Foley Analogies
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
25
38
0
17 Apr 2023
Recursive Joint Attention for Audio-Visual Fusion in Regression based
  Emotion Recognition
Recursive Joint Attention for Audio-Visual Fusion in Regression based Emotion Recognition
R Gnana Praveen
Eric Granger
P. Cardinal
27
10
0
17 Apr 2023
Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Wenke Xia
Xingjian Li
Andong Deng
Haoyi Xiong
Dejing Dou
Di Hu
19
5
0
16 Apr 2023
Skeleton-based action analysis for ADHD diagnosis
Skeleton-based action analysis for ADHD diagnosis
Yichun Li
Yi Li
R. Nair
S. M. Naqvi
20
2
0
14 Apr 2023
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action
  Recognition
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
Ruiqi Xian
Xijun Wang
D. Kothandaraman
Tianyi Zhou
23
7
0
14 Apr 2023
Zoom-VQA: Patches, Frames and Clips Integration for Video Quality
  Assessment
Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment
Kai Zhao
Kun Yuan
Ming-Ting Sun
Xingsen Wen
21
20
0
13 Apr 2023
Robust Multiview Multimodal Driver Monitoring System Using Masked
  Multi-Head Self-Attention
Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention
Yiming Ma
Victor Sanchez
S. Nikan
Devesh Upadhyay
Bhushan Atote
T. Guha
22
2
0
13 Apr 2023
RECLIP: Resource-efficient CLIP by Training with Small Images
RECLIP: Resource-efficient CLIP by Training with Small Images
Runze Li
Dahun Kim
B. Bhanu
Weicheng Kuo
VLM
CLIP
36
13
0
12 Apr 2023
VARS: Video Assistant Referee System for Automated Soccer Decision
  Making from Multiple Views
VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views
Jan Held
A. Cioppa
Silvio Giancola
Abdullah Hamdi
Guohao Li
Marc Van Droogenbroeck
27
29
0
10 Apr 2023
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Ziteng Gao
Zhan Tong
Limin Wang
Mike Zheng Shou
33
9
0
07 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
39
74
0
06 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
42
20
0
05 Apr 2023
Black Box Few-Shot Adaptation for Vision-Language models
Black Box Few-Shot Adaptation for Vision-Language models
Yassine Ouali
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
VLM
39
31
0
04 Apr 2023
MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot
  Action Recognition
MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Changxin Gao
Yingya Zhang
Deli Zhao
Nong Sang
24
40
0
03 Apr 2023
Focalized Contrastive View-invariant Learning for Self-supervised
  Skeleton-based Action Recognition
Focalized Contrastive View-invariant Learning for Self-supervised Skeleton-based Action Recognition
Qianhui Men
Edmond S. L. Ho
Hubert P. H. Shum
Howard Leung
SSL
35
19
0
03 Apr 2023
Video Pretraining Advances 3D Deep Learning on Chest CT Tasks
Video Pretraining Advances 3D Deep Learning on Chest CT Tasks
Alexander Ke
Shih-Cheng Huang
Chloe P. O'Connell
M. Klimont
Serena Yeung
Pranav Rajpurkar
21
8
0
02 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
26
4
0
01 Apr 2023
Streaming Video Model
Streaming Video Model
Yucheng Zhao
Chong Luo
Chuanxin Tang
Dongdong Chen
Noel Codella
Zhengjun Zha
36
12
0
30 Mar 2023
HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on
  FPGA Devices
HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices
Petros Toupas
Alexander Montgomerie-Corcoran
C. Bouganis
Dimitrios Tzovaras
30
9
0
30 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
71
329
0
29 Mar 2023
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video
  Representations for Semi-Supervised Action Recognition
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
I. Dave
Mamshad Nayeem Rizve
Chong Chen
M. Shah
TTA
44
16
0
28 Mar 2023
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action
  Detection
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
25
3
0
28 Mar 2023
Rethinking matching-based few-shot action recognition
Rethinking matching-based few-shot action recognition
Juliette Bertrand
Yannis Kalantidis
Giorgos Tolias
32
1
0
28 Mar 2023
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
29
16
0
28 Mar 2023
SELF-VS: Self-supervised Encoding Learning For Video Summarization
SELF-VS: Self-supervised Encoding Learning For Video Summarization
Hojjat Mokhtarabadi
Kaveh Bahraman
M. Hosseinzadeh
M. Eftekhari
AI4TS
SSL
ViT
25
0
0
28 Mar 2023
Unified Keypoint-based Action Recognition Framework via Structured
  Keypoint Pooling
Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
Ryo Hachiuma
Fumiaki Sato
Taiki Sekii
3DPC
29
37
0
27 Mar 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
41
95
0
25 Mar 2023
Enlarging Instance-specific and Class-specific Information for Open-set
  Action Recognition
Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition
Jun Cen
Shiwei Zhang
Xiang Wang
Yixuan Pei
Zhiwu Qing
Yingya Zhang
Qifeng Chen
34
3
0
25 Mar 2023
A Large-scale Study of Spatiotemporal Representation Learning with a New
  Benchmark on Action Recognition
A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition
Andong Deng
Taojiannan Yang
Chong Chen
AI4TS
27
13
0
23 Mar 2023
Previous
123...678...242526
Next