Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.04851
Cited By
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Z. Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 650 papers shown
Title
Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
Irfan Essa
SSL
23
3
0
03 Sep 2023
Self-Supervised Video Transformers for Isolated Sign Language Recognition
Marcelo Sandoval-Castaneda
Yanhong Li
D. Brentari
Karen Livescu
Gregory Shakhnarovich
SLR
23
2
0
02 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
34
20
0
27 Aug 2023
Attending Generalizability in Course of Deep Fake Detection by Exploring Multi-task Learning
P. Balaji
Abhijit Das
Srijan Das
A. Dantcheva
CVBM
21
4
0
25 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning
D. Fan
Jue Wang
Shuai Liao
Yi Zhu
Vimal Bhat
H. Santos-Villalobos
M. Rohith
Xinyu Li
VGen
37
19
0
24 Aug 2023
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
38
0
0
24 Aug 2023
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Ziyuan Yang
Sucheng Ren
Zongwei Wu
Nanxuan Zhao
Junle Wang
Jing Qin
Shengfeng He
37
2
0
23 Aug 2023
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
43
16
0
22 Aug 2023
Temporal-Distributed Backdoor Attack Against Video Based Action Recognition
Xi Li
Songhe Wang
Rui Huang
Mahanth K. Gowda
G. Kesidis
AAML
39
6
0
21 Aug 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Fangyun Wei
Yutong Chen
SLR
28
28
0
21 Aug 2023
Joint learning of images and videos with a single Vision Transformer
Shuki Shimizu
Toru Tamaki
ViT
19
0
0
21 Aug 2023
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning
Qianqian Wang
Junlong Du
Ke Yan
Shouhong Ding
VLM
38
17
0
09 Aug 2023
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment
Kun Yuan
Zishang Kong
Chuanchuan Zheng
Ming-Ting Sun
Xingsen Wen
ViT
32
14
0
31 Jul 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
37
7
0
27 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
40
8
0
18 Jul 2023
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
Hongfei Yan
Yang Liu
Yushen Wei
Zerui Li
Guanbin Li
Liang Lin
34
40
0
17 Jul 2023
TALL: Thumbnail Layout for Deepfake Video Detection
Yuting Xu
Jian Liang
Gengyun Jia
Ziming Yang
Yanhao Zhang
Ran He
ViT
54
53
0
14 Jul 2023
TVPR: Text-to-Video Person Retrieval and a New Benchmark
Fan Ni
Xu Zhang
Jianhui Wu
Guan-Nan Dong
Aichun Zhu
Hui Liu
Yue Zhang
48
0
0
14 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
54
19
0
13 Jul 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
39
89
0
11 Jul 2023
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models
Wei Han
Hui Chen
MingSung Kan
Soujanya Poria
32
1
0
09 Jul 2023
Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Md Zahid Hasan
Jiajing Chen
Jiyang Wang
Mohammed Shaiqur Rahman
Ameya Joshi
Senem Velipasalar
C. Hegde
Anuj Sharma
S. Sarkar
VLM
49
18
0
16 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Aman Chadha
Srijan Das
ViT
33
4
0
15 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
35
22
0
06 Jun 2023
Masked Autoencoder for Unsupervised Video Summarization
Minho Shim
Taeoh Kim
Jinhyung Kim
Dongyoon Wee
28
1
0
02 Jun 2023
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
Sanjoy Kundu
Shubham Trehan
Sathyanarayanan N. Aakur
LRM
LM&Ro
27
1
0
26 May 2023
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Thanh-Dat Truong
Khoa Luu
EgoV
32
10
0
25 May 2023
TG-VQA: Ternary Game of Video Question Answering
Hao Li
Peng Jin
Ze-Long Cheng
Songyang Zhang
Kai-xiang Chen
Zhennan Wang
Chang-rui Liu
Jie Chen
26
10
0
17 May 2023
Lightweight Delivery Detection on Doorbell Cameras
Pirazh Khorramshahi
Zhe Wu
Tianchen Wang
Luke Deluccia
Hongcheng Wang
14
0
0
13 May 2023
Visual Tuning
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
51
38
0
10 May 2023
Improve Video Representation with Temporal Adversarial Augmentation
Jinhao Duan
Quanfu Fan
Hao-Ran Cheng
Xiaoshuang Shi
Kaidi Xu
AAML
AI4TS
ViT
31
2
0
28 Apr 2023
SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow Estimation
Fisseha Admasu Ferede
M. Balasubramanian
21
3
0
26 Apr 2023
MRSN: Multi-Relation Support Network for Video Action Detection
Yin-Dong Zheng
Guo Chen
Minglei Yuan
Tong Lu
36
8
0
24 Apr 2023
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
S. Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Hang-Rui Hu
Yu-Gang Jiang
30
35
0
20 Apr 2023
Pretrained Language Models as Visual Planners for Human Assistance
Dhruvesh Patel
H. Eghbalzadeh
Nitin Kamra
Michael L. Iuzzolino
Unnat Jain
Ruta Desai
LM&Ro
19
24
0
17 Apr 2023
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision
Jiani Huang
Ziyang Li
Mayur Naik
Ser-Nam Lim
37
3
0
15 Apr 2023
Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment
Kai Zhao
Kun Yuan
Ming-Ting Sun
Xingsen Wen
21
20
0
13 Apr 2023
AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection
Wentao Zhu
Yufang Huang
Xi Xie
Wenxian Liu
Jincan Deng
Debing Zhang
Zhangyang Wang
Ji Liu
37
16
0
12 Apr 2023
Scallop: A Language for Neurosymbolic Programming
Ziyang Li
Jiani Huang
Mayur Naik
ReLM
LRM
NAI
24
30
0
10 Apr 2023
Hyperspectral Image Super-Resolution via Dual-domain Network Based on Hybrid Convolution
Tingting Liu
Yuan Liu
Chun-liang Zhang
Liyin Yuan
Xiubao Sui
Qian Chen
SupR
24
18
0
10 Apr 2023
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Ziteng Gao
Zhan Tong
Limin Wang
Mike Zheng Shou
33
9
0
07 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
39
74
0
06 Apr 2023
Sketch-based Video Object Localization
Sangmin Woo
So-Yeong Jeon
Jinyoung Park
Minji Son
Sumin Lee
Changick Kim
19
0
0
02 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
26
4
0
01 Apr 2023
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Yiwu Zhong
Licheng Yu
Yang Bai
Shangwen Li
Xueting Yan
Yin Li
AI4TS
38
32
0
31 Mar 2023
Streaming Video Model
Yucheng Zhao
Chong Luo
Chuanxin Tang
Dongdong Chen
Noel Codella
Zhengjun Zha
36
12
0
30 Mar 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
40
7
0
29 Mar 2023
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
25
3
0
28 Mar 2023
Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
Ryo Hachiuma
Fumiaki Sato
Taiki Sekii
3DPC
29
37
0
27 Mar 2023
Learning Action Changes by Measuring Verb-Adverb Textual Relationships
Davide Moltisanti
Frank Keller
Hakan Bilen
Laura Sevilla-Lara
31
7
0
27 Mar 2023
Previous
1
2
3
4
5
6
...
11
12
13
Next