Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.04676
Cited By
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
12 January 2022
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning"
50 / 141 papers shown
Title
Kernel Dynamic Mode Decomposition For Sparse Reconstruction of Closable Koopman Operators
Nishant Panda
Himanshu Singh
J. Nathan Kutz
29
0
0
11 May 2025
CANet: ChronoAdaptive Network for Enhanced Long-Term Time Series Forecasting under Non-Stationarity
Mert Sonmezer
Seyda Ertekin
AI4TS
26
0
0
24 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
41
0
0
02 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
57
0
0
01 Apr 2025
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Thinesh Thiyakesan Ponbagavathi
Alina Roitberg
39
0
0
31 Mar 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
51
0
0
30 Mar 2025
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Zichen Liu
Kunlun Xu
Bing-Huang Su
Xu Zou
Yuxin Peng
Jiahuan Zhou
VLM
AI4TS
65
1
0
20 Mar 2025
A Real-Time Human Action Recognition Model for Assisted Living
Yixuan Wang
Paul Stynes
Pramod Pathak
Cristina Muntean
34
0
0
18 Mar 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
59
0
0
17 Mar 2025
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
Xin Jin
Haisheng Su
Kai Liu
Cong Ma
Wei Yu Wu
Fei Hui
Junchi Yan
Mamba
77
0
0
15 Mar 2025
AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements
Calvin Yeung
Tomohiro Suzuki
Ryota Tanaka
Zhuoer Yin
Keisuke Fujii
3DH
68
1
0
10 Mar 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Jinwei Gu
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
146
0
0
21 Jan 2025
Reinforcement Learning from Wild Animal Videos
Elliot Chane-Sane
Constant Roux
O. Stasse
Nicolas Mansard
164
0
0
05 Dec 2024
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zenghui Ding
Xianjun Yang
Yining Sun
189
1
0
21 Nov 2024
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang
Haofeng Huang
Pengle Zhang
Jia wei
Jun-Jie Zhu
Jianfei Chen
VLM
MQ
63
2
0
17 Nov 2024
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
40
0
0
12 Nov 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
82
18
0
03 Oct 2024
Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection
Yuncheng Jiang
Zixun Zhang
Jun Wei
Chun-Mei Feng
Guanbin Li
Xiang Wan
Shuguang Cui
Zhen Li
ViT
MedIm
29
1
0
26 Aug 2024
MPT-PAR:Mix-Parameters Transformer for Panoramic Activity Recognition
Wenqing Gan
Yaoyu Li
Jian Li
Zhangang Lin
ViT
30
0
0
01 Aug 2024
Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference
Qian Liang
Yan Chen
Yang Hu
CLL
42
2
0
19 Jul 2024
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li
Zhenhua Feng
Tianyang Xu
Linze Li
Xiao-Jun Wu
Muhammad Awais
Sara Atito
Josef Kittler
CoGe
52
5
0
08 Jul 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
44
4
0
21 Jun 2024
Infinite-Dimensional Feature Interaction
Chenhui Xu
Fuxun Yu
Maoliang Li
Zihao Zheng
Zirui Xu
Jinjun Xiong
Xiang Chen
34
1
0
22 May 2024
Towards Gradient-based Time-Series Explanations through a SpatioTemporal Attention Network
Min Hun Lee
AI4TS
ViT
FAtt
24
3
0
18 May 2024
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
28
1
0
09 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGen
LM&Ro
81
36
0
06 May 2024
CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention
Damith Chamalke Senadeera
Xiaoyun Yang
Dimitrios Kollias
Gregory G. Slabaugh
32
0
0
27 Apr 2024
Data-independent Module-aware Pruning for Hierarchical Vision Transformers
Yang He
Joey Tianyi Zhou
ViT
44
3
0
21 Apr 2024
TSLANet: Rethinking Transformers for Time Series Representation Learning
Emadeldeen Eldele
Mohamed Ragab
Zhenghua Chen
Min-man Wu
Xiaoli Li
AI4TS
AIFin
36
35
0
12 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
60
56
0
04 Apr 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
44
31
0
29 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
69
14
0
26 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
36
1
0
24 Mar 2024
CoReEcho: Continuous Representation Learning for 2D+time Echocardiography Analysis
F. Maani
Numan Saeed
Aleksandr Matsun
Mohammad Yaqub
SyDa
60
3
0
15 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
73
0
14 Mar 2024
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia
Xinliang Wang
Feng Lv
Xin Hao
Yifeng Shi
ViT
26
45
0
12 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
35
180
0
11 Mar 2024
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
Dan Guo
Kun Li
Bin Hu
Yan Zhang
Meng Wang
57
38
0
08 Mar 2024
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDE
VLM
89
73
0
15 Feb 2024
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao
Xin Lin
Qin Ni
Liang He
21
3
0
12 Feb 2024
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
F. Worgotter
Alexander S. Ecker
28
3
0
29 Jan 2024
GTAutoAct: An Automatic Datasets Generation Framework Based on Game Engine Redevelopment for Action Recognition
Xingyu Song
Zhan Li
Shi Chen
K. Demachi
27
1
0
24 Jan 2024
Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts
Kiyoon Kim
Shreyank N. Gowda
Panagiotis Eustratiadis
Antreas Antoniou
Robert B Fisher
37
2
0
21 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
32
0
0
10 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
27
4
0
08 Jan 2024
Explore Human Parsing Modality for Action Recognition
Jinfu Liu
Runwei Ding
Yuhang Wen
Nan Dai
Fanyang Meng
Shen Zhao
Mengyuan Liu
25
7
0
04 Jan 2024
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
Jiarui Xu
Yossi Gandelsman
Amir Bar
Jianwei Yang
Jianfeng Gao
Trevor Darrell
Xiaolong Wang
VLM
21
3
0
04 Dec 2023
D
2
^2
2
ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei
Qizhong Tan
Guangming Lu
Jiandong Tian
41
3
0
03 Dec 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
35
12
0
30 Nov 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
30
15
0
30 Oct 2023
1
2
3
Next