Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 657 papers shown
Title
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Rong Gao
Xin Liu
Bohao Xing
Zitong Yu
Björn W. Schuller
Heikki Kälviäinen
155
3
0
21 May 2024
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Yingjie Zhai
Wenshuo Li
Yehui Tang
Xinghao Chen
Yunhe Wang
ViT
66
0
0
14 May 2024
DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model
Yang Jin
Jun Lv
Shuqiang Jiang
Cewu Lu
127
1
0
12 May 2024
Deep video representation learning: a survey
Elham Ravanbakhsh
Yongqing Liang
J. Ramanujam
Xin Li
80
3
0
10 May 2024
Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation
Mo Guan
Yan Wang
Guangkun Ma
Jiarui Liu
Mingzu Sun
SLR
75
7
0
09 May 2024
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
60
1
0
09 May 2024
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method
Peisong He
Leyao Zhu
Jiaxing Li
Shiqi Wang
Haoliang Li
EGVM
85
3
0
07 May 2024
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News
Zhe Niu
Ronglai Zuo
Brian Mak
Fangyun Wei
55
6
0
02 May 2024
SFMViT: SlowFast Meet ViT in Chaotic World
Jiaying Lin
Jiajun Wen
Mengyuan Liu
Jinfu Liu
Baiqiao Yin
Yue Li
ViT
64
1
0
25 Apr 2024
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang
Sule Bai
Guangyi Chen
Lei Chen
Jiwen Lu
Junle Wang
Yansong Tang
103
10
0
22 Apr 2024
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
83
0
0
15 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
138
4
0
06 Apr 2024
A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection
Chih-Chung Hsu
Chia-Ming Lee
Chiang Fan Yang
Yi-Shiuan Chou
Chih-Yu Jiang
Shen-Chieh Tai
Chin-Han Tsai
82
1
0
02 Apr 2024
LORD: Large Models based Opposite Reward Design for Autonomous Driving
Xin Ye
Feng Tao
Abhirup Mallik
Burhaneddin Yaman
Liu Ren
OffRL
114
5
0
27 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
74
2
0
24 Mar 2024
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
Hanlei Zhang
Xin Wang
Hua Xu
Qianrui Zhou
Kai Gao
Jianhua Su
jinyue Zhao
Wenrui Li
Yanting Chen
137
5
0
16 Mar 2024
RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training
Zhixiu Lu
Hailong Li
N. Parikh
Jonathan R. Dillman
Lili He
MedIm
VLM
129
1
0
15 Mar 2024
On the Utility of 3D Hand Poses for Action Recognition
Md Salman Shamil
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
77
6
0
14 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
135
78
0
14 Mar 2024
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling
W. G. C. Bandara
Vishal M. Patel
VPVLM
VLM
78
1
0
11 Mar 2024
A spatiotemporal style transfer algorithm for dynamic visual stimulus generation
Antonino Greco
Markus Siegel
70
2
0
07 Mar 2024
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Jun Xiong
Peng Zhang
Tao You
Chuanyue Li
Wei Huang
Yufei Zha
DiffM
85
5
0
02 Mar 2024
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao
Xin Lin
Qin Ni
Liang He
79
4
0
12 Feb 2024
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
Yufei Wang
Zhanyi Sun
Jesse Zhang
Zhou Xian
Erdem Biyik
David Held
Zackory M. Erickson
VLM
122
59
0
06 Feb 2024
SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks
Xingning Dong
Qingpei Guo
Tian Gan
Qing Wang
Jianlong Wu
Xiangyuan Ren
Yuan Cheng
Wei Chu
63
5
0
31 Jan 2024
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
Florentin Wörgötter
Alexander S. Ecker
129
6
0
29 Jan 2024
Synchformer: Efficient Synchronization from Sparse Cues
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
88
14
0
29 Jan 2024
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
Shuokang Huang
Kaihan Li
Di You
Yichong Chen
Arvin Lin
Siying Liu
Xiaohui Li
Julie A. McCann
76
10
0
24 Jan 2024
SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning
Hao Chen
Jiaze Wang
Ziyu Guo
Jinpeng Li
Donghao Zhou
Bian Wu
Chenyong Guan
Guangyong Chen
Pheng-Ann Heng
97
6
0
22 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
108
7
0
18 Jan 2024
Transformer-based Video Saliency Prediction with High Temporal Dimension Decoding
Morteza Moradi
S. Palazzo
C. Spampinato
58
3
0
15 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
161
1
0
15 Jan 2024
SnapCap: Efficient Snapshot Compressive Video Captioning
Jianqiao Sun
Yudi Su
Hao Zhang
Ziheng Cheng
Zequn Zeng
Zhengjue Wang
Bo Chen
Xin Yuan
136
1
0
10 Jan 2024
Multi-Stage Contrastive Regression for Action Quality Assessment
Qi An
Mengshi Qi
Huadong Ma
65
4
0
05 Jan 2024
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
Ziyi Bai
Ruiping Wang
Xilin Chen
163
8
0
03 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
222
100
0
29 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
88
5
0
21 Dec 2023
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Fan Yu
Haoxu Wang
Ziyang Ma
Shiliang Zhang
93
2
0
14 Dec 2023
Generative Model-based Feature Knowledge Distillation for Action Recognition
Guiqin Wang
Peng Zhao
Yanjiang Shi
Cong Zhao
Shusen Yang
VLM
72
3
0
14 Dec 2023
ConFormer: A Novel Collection of Deep Learning Models to Assist Cardiologists in the Assessment of Cardiac Function
Ethan Thomas
Salman Aslam
MedIm
55
0
0
13 Dec 2023
Combined Scheduling, Memory Allocation and Tensor Replacement for Minimizing Off-Chip Data Accesses of DNN Accelerators
Yi Li
Aarti Gupta
Sharad Malik
29
1
0
30 Nov 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
152
2
0
30 Nov 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
43
1
0
29 Nov 2023
F4D: Factorized 4D Convolutional Neural Network for Efficient Video-level Representation Learning
Mohammad Al-Saad
Lakshmish Ramaswamy
S. Bhandarkar
AI4TS
38
1
0
28 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
59
6
0
27 Nov 2023
MoVideo: Motion-Aware Video Generation with Diffusion Models
Christos Sakaridis
Yuchen Fan
Kai Zhang
Radu Timofte
Luc Van Gool
Rakesh Ranjan
DiffM
VGen
85
10
0
19 Nov 2023
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models
.Ilker Kesen
Andrea Pedrotti
Mustafa Dogan
Michele Cafagna
Emre Can Acikgoz
...
Iacer Calixto
Anette Frank
Albert Gatt
Aykut Erdem
Erkut Erdem
94
19
0
13 Nov 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIP
VLM
VGen
106
2
0
30 Oct 2023
RoboCLIP: One Demonstration is Enough to Learn Robot Policies
Sumedh Anand Sontakke
Jesse Zhang
Sébastien M. R. Arnold
Karl Pertsch
Erdem Biyik
Dorsa Sadigh
Chelsea Finn
Laurent Itti
OffRL
71
74
0
11 Oct 2023
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
Jingyuan Qi
Minqian Liu
Ying Shen
Zhiyang Xu
Lifu Huang
LRM
VGen
82
2
0
08 Oct 2023
Previous
1
2
3
4
5
...
12
13
14
Next