Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.11248
Cited By
A Closer Look at Spatiotemporal Convolutions for Action Recognition
30 November 2017
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Closer Look at Spatiotemporal Convolutions for Action Recognition"
50 / 477 papers shown
Title
CNN-Based Action Recognition and Pose Estimation for Classifying Animal Behavior from Videos: A Survey
Michael Perez
Corey Toler-Franklin
MedIm
36
14
0
15 Jan 2023
Deep Diversity-Enhanced Feature Representation of Hyperspectral Images
Jinhui Hou
Zhiyu Zhu
Junhui Hou
Hui Liu
Huanqiang Zeng
Deyu Meng
18
6
0
15 Jan 2023
Triple-stream Deep Metric Learning of Great Ape Behavioural Actions
Otto Brookes
Majid Mirmehdi
H. Kühl
T. Burghardt
31
14
0
06 Jan 2023
EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding
Shuhan Tan
Tushar Nagarajan
Kristen Grauman
26
21
0
05 Jan 2023
Ego-Only: Egocentric Action Detection without Exocentric Transferring
Huiyu Wang
Mitesh Singh
Lorenzo Torresani
EgoV
72
24
0
03 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Guohao Li
AAML
34
8
0
03 Jan 2023
StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition
Xi Shen
Zhedong Zheng
Yi Yang
SLR
30
13
0
25 Dec 2022
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
J. Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
SSL
19
6
0
21 Dec 2022
A Survey on Human Action Recognition
Zhou Shuchang
29
0
0
20 Dec 2022
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
Sangwon Kim
Dasom Ahn
ByoungChul Ko
ViT
3DPC
38
24
0
12 Dec 2022
Multimodal Vision Transformers with Forced Attention for Behavior Analysis
Tanay Agrawal
Michal Balazia
Philippe Muller
Franccois Brémond
ViT
23
9
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
36
54
0
06 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
39
0
0
03 Dec 2022
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Chen Zhao
Shuming Liu
K. Mangalam
Guohao Li
38
17
0
25 Nov 2022
Towards Good Practices for Missing Modality Robust Action Recognition
Sangmin Woo
Sumin Lee
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
22
43
0
25 Nov 2022
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
VGen
61
37
0
23 Nov 2022
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
27
1
0
23 Nov 2022
MagicVideo: Efficient Video Generation With Latent Diffusion Models
Daquan Zhou
Weimin Wang
Hanshu Yan
Weiwei Lv
Yizhe Zhu
Jiashi Feng
DiffM
VGen
39
373
0
20 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
30
107
0
17 Nov 2022
Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022
Yin-Dong Zheng
Guo Chen
Jiahao Wang
Tong Lu
Liming Wang
40
0
0
16 Nov 2022
Dynamic Temporal Filtering in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Chong-Wah Ngo
Tao Mei
AI4TS
32
17
0
15 Nov 2022
Learning to Model Multimodal Semantic Alignment for Story Visualization
Bowen Li
Thomas Lukasiewicz
DiffM
31
2
0
14 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViT
CVBM
27
60
0
12 Nov 2022
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
Hyolim Kang
Hanjung Kim
Joungbin An
Minsu Cho
Seon Joo Kim
38
5
0
11 Nov 2022
Eat-Radar: Continuous Fine-Grained Intake Gesture Detection Using FMCW Radar and 3D Temporal Convolutional Network with Attention
C. Wang
T. S. Kumar
W. de Raedt
Guido Camps
Hans Hallez
Bart Vanrumste
24
12
0
08 Nov 2022
Egocentric Audio-Visual Noise Suppression
Roshan S. Sharma
Weipeng He
Ju Lin
Egor Lakomkin
Yang Liu
Kaustubh Kalgaonkar
EgoV
24
1
0
07 Nov 2022
No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Jose Vargas-Quiros
Laura Cabrera-Quiros
Hayley Hung
29
1
0
01 Nov 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J Liang
Tal Hassner
James J. Clark
27
3
0
24 Oct 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
31
8
0
24 Oct 2022
Learning a Grammar Inducer from Massive Uncurated Instructional Videos
Songyang Zhang
Linfeng Song
Lifeng Jin
Haitao Mi
Kun Xu
Dong Yu
Jiebo Luo
38
5
0
22 Oct 2022
Cyclical Self-Supervision for Semi-Supervised Ejection Fraction Prediction from Echocardiogram Videos
Weihang Dai
Xuelong Li
Xinpeng Ding
Kwang-Ting Cheng
46
22
0
20 Oct 2022
MovieCLIP: Visual Scene Recognition in Movies
Digbalay Bose
Rajat Hebbar
Krishna Somandepalli
Haoyang Zhang
Huayu Chen
K. Cole-McLaughlin
Haoran Wang
Shrikanth Narayanan
CLIP
22
21
0
20 Oct 2022
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
Dasom Ahn
Sangwon Kim
H. Hong
ByoungChul Ko
ViT
28
97
0
14 Oct 2022
S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces
Eric N. D. Nguyen
Karan Goel
Albert Gu
Gordon W. Downs
Preey Shah
Tri Dao
S. Baccus
Christopher Ré
VLM
22
39
0
12 Oct 2022
DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition
Haodong Duan
Jiaqi Wang
Kai-xiang Chen
Dahua Lin
38
42
0
12 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
78
23
0
27 Sep 2022
Leveraging Self-Supervised Training for Unintentional Action Recognition
Enea Duka
Anna Kukleva
Bernt Schiele
35
1
0
23 Sep 2022
Adaptive Local-Component-aware Graph Convolutional Network for One-shot Skeleton-based Action Recognition
Anqi Zhu
Qiuhong Ke
Mingming Gong
James Bailey
41
21
0
21 Sep 2022
Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
R Gnana Praveen
Eric Granger
P. Cardinal
CVBM
56
31
0
19 Sep 2022
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain
Francesco Ragusa
Antonino Furnari
G. Farinella
EgoV
43
24
0
19 Sep 2022
Moving from 2D to 3D: volumetric medical image classification for rectal cancer staging
Joohyun Lee
J. Oh
Inkyu Shin
You-sung Kim
D. Sohn
Tae-Sung Kim
In So Kweon
MedIm
29
4
0
13 Sep 2022
A Novel Self-Knowledge Distillation Approach with Siamese Representation Learning for Action Recognition
Duc-Quang Vu
T. Phung
Jia-Ching Wang
27
9
0
03 Sep 2022
Survey: Exploiting Data Redundancy for Optimization of Deep Learning
Jou-An Chen
Wei Niu
Bin Ren
Yanzhi Wang
Xipeng Shen
23
24
0
29 Aug 2022
Lane Change Classification and Prediction with Action Recognition Networks
Kai-Bin Liang
Jun Wang
A. Bhalerao
16
2
0
24 Aug 2022
Modality Mixer for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
26
10
0
24 Aug 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Teruko Mitamura
Alexander G. Hauptmann
16
34
0
18 Aug 2022
Blockwise Temporal-Spatial Pathway Network
SeulGi Hong
Min-Kook Choi
24
1
0
05 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
40
313
0
04 Aug 2022
Surgical Skill Assessment via Video Semantic Aggregation
Zhenqiang Li
Lin Gu
Weimin Wang
Ryosuke Nakamura
Yoichi Sato
28
13
0
04 Aug 2022
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living
Zdravko Marinov
David Schneider
Alina Roitberg
Rainer Stiefelhagen
VGen
32
2
0
03 Aug 2022
Previous
1
2
3
4
5
6
...
8
9
10
Next