Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02486
Cited By
STM: SpatioTemporal and Motion Encoding for Action Recognition
7 August 2019
Boyuan Jiang
Mengmeng Wang
Weihao Gan
Wei Wu
Junjie Yan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"STM: SpatioTemporal and Motion Encoding for Action Recognition"
50 / 75 papers shown
Title
Conformal Predictions for Human Action Recognition with Vision-Language Models
Bary Tim
Fuchs Clément
Macq Benoît
VLM
51
0
0
10 Feb 2025
Continuous Sign Language Recognition Based on Motor attention mechanism and frame-level Self-distillation
Qidan Zhu
Jing Li
Fei Yuan
Quan Gan
SLR
53
3
0
29 Feb 2024
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
37
7
0
27 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
40
8
0
18 Jul 2023
Improve Video Representation with Temporal Adversarial Augmentation
Jinhao Duan
Quanfu Fan
Hao-Ran Cheng
Xiaoshuang Shi
Kaidi Xu
AAML
AI4TS
ViT
31
2
0
28 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
39
74
0
06 Apr 2023
MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Changxin Gao
Yingya Zhang
Deli Zhao
Nong Sang
24
40
0
03 Apr 2023
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer
N. H. Phong
B. Ribeiro
29
15
0
17 Feb 2023
An end-to-end multi-scale network for action prediction in videos
Xiaofan Liu
Jianqin Yin
Yuanxi Sun
Zhicheng Zhang
Jin Tang
27
0
0
31 Dec 2022
DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition
Santosh Kumar Yadav
Achleshwar Luthra
Esha Pahwa
K. Tiwari
Heena Rathore
Hari Mohan Pandey
Peter Corcoran
34
12
0
07 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
39
0
0
03 Dec 2022
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Chen Zhao
Shuming Liu
K. Mangalam
Guohao Li
38
17
0
25 Nov 2022
Can lies be faked? Comparing low-stakes and high-stakes deception video datasets from a Machine Learning perspective
M. Camara
Adriana Postal
Tomas Henrique Maul
Gustavo Henrique Paetzold
13
7
0
23 Nov 2022
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
27
1
0
23 Nov 2022
Dynamic Temporal Filtering in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Chong-Wah Ngo
Tao Mei
AI4TS
32
17
0
15 Nov 2022
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
27
0
0
11 Nov 2022
Fully-attentive and interpretable: vision and video vision transformers for pain detection
Giacomo Fiorentini
Itir Onal Ertugrul
A. A. Salah
MedIm
ViT
21
2
0
27 Oct 2022
Motion Matters: A Novel Motion Modeling For Cross-View Gait Feature Learning
Jingqi Li
Jiaqi Gao
Yuzhen Zhang
Hongming Shan
Junping Zhang
CVBM
17
2
0
21 Oct 2022
Motion Aware Self-Supervision for Generic Event Boundary Detection
Ayush K. Rai
Tarun Krishna
J. Dietlmeier
Kevin McGuinness
Alan F. Smeaton
Noel E. O'Connor
34
2
0
11 Oct 2022
Efficient Attention-free Video Shift Transformers
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
ViT
29
1
0
23 Aug 2022
MAR: Masked Autoencoders for Efficient Action Recognition
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Xiang Wang
Yuehuang Wang
Yiliang Lv
Changxin Gao
Nong Sang
32
42
0
24 Jul 2022
VidConv: A modernized 2D ConvNet for Efficient Video Recognition
Chuong H. Nguyen
Su Huynh
Vinh Nguyen
Ngoc-Khanh Nguyen
ViT
27
3
0
08 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
103
93
0
04 Jul 2022
Motion Gait: Gait Recognition via Motion Excitation
Yunpeng Zhang
Zhengyou Wang
Shanna Zhuang
Hui Wang
CVBM
16
1
0
22 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
37
15
0
13 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
28
52
0
02 Jun 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
35
91
0
04 Apr 2022
Gate-Shift-Fuse for Video Action Recognition
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
22
22
0
16 Mar 2022
Motion-driven Visual Tempo Learning for Video-based Action Recognition
Yuanzhong Liu
Junsong Yuan
Zhigang Tu
27
58
0
24 Feb 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
162
360
0
24 Jan 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
48
198
0
20 Jan 2022
Action Keypoint Network for Efficient Video Recognition
Xu Chen
Yahong Han
Xiaohan Wang
Yifang Sun
Yi Yang
3DPC
27
6
0
17 Jan 2022
Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance
Justin Sanchez
Christopher Neff
Hamed Tabkhi
GNN
30
9
0
15 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
47
238
0
12 Jan 2022
Representing Videos as Discriminative Sub-graphs for Action Recognition
Dong Li
Zhaofan Qiu
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
44
25
0
11 Jan 2022
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search
Yi Ding
Xinyu Gong
Junru Wu
Humphrey Shi
Zhicheng Yan
Zhangyang Wang
VGen
52
1
0
09 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Keli Zhang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
32
21
0
09 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
75
678
0
02 Dec 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
35
73
0
25 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
21
63
0
23 Nov 2021
Relational Self-Attention: What's Missing in Attention for Video Understanding
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
ViT
27
28
0
02 Nov 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
21
24
0
27 Oct 2021
TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding
Zhengwei Wang
Qi She
A. Smolic
21
9
0
17 Oct 2021
TAda! Temporally-Adaptive Convolutions for Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Mingqian Tang
Ziwei Liu
M. Ang
43
49
0
12 Oct 2021
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
152
362
0
17 Sep 2021
LIGAR: Lightweight General-purpose Action Recognition
Evgeny Izutov
15
3
0
30 Aug 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
26
77
0
20 Aug 2021
Adaptive Recursive Circle Framework for Fine-grained Action Recognition
Hanxi Lin
Xinxiao Wu
Jiebo Luo
25
1
0
25 Jul 2021
EAN: Event Adaptive Network for Enhanced Action Recognition
Yuan Tian
Yichao Yan
Guangtao Zhai
G. Guo
Zhiyong Gao
35
41
0
22 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
27
543
0
30 Jun 2021
1
2
Next