Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 657 papers shown
Title
Sketch-based Video Object Localization
Sangmin Woo
So-Yeong Jeon
Jinyoung Park
Minji Son
Sumin Lee
Changick Kim
91
0
0
02 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
65
4
0
01 Apr 2023
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Yiwu Zhong
Licheng Yu
Yang Bai
Shangwen Li
Xueting Yan
Yin Li
AI4TS
106
34
0
31 Mar 2023
Streaming Video Model
Yucheng Zhao
Chong Luo
Chuanxin Tang
DongDong Chen
Noel Codella
Zhengjun Zha
79
13
0
30 Mar 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
112
7
0
29 Mar 2023
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
55
3
0
28 Mar 2023
Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
Ryo Hachiuma
Fumiaki Sato
Taiki Sekii
3DPC
75
39
0
27 Mar 2023
Learning Action Changes by Measuring Verb-Adverb Textual Relationships
Davide Moltisanti
Frank Keller
Hakan Bilen
Laura Sevilla-Lara
112
7
0
27 Mar 2023
A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition
Andong Deng
Taojiannan Yang
Chong Chen
AI4TS
109
15
0
23 Mar 2023
Natural Language-Assisted Sign Language Recognition
Ronglai Zuo
Fangyun Wei
Brian Mak
SLR
74
44
0
21 Mar 2023
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Fida Mohammad Thoker
Hazel Doughty
Cees G. M. Snoek
ViT
100
9
0
20 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
ViT
82
38
0
17 Mar 2023
Video Action Recognition with Attentive Semantic Units
Yifei Chen
Dapeng Chen
Ruijin Liu
Hao Li
Wei Peng
69
11
0
17 Mar 2023
CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
Jun Xiong
Gang Wang
Peng Zhang
Wei Huang
Yufei Zha
Guangtao Zhai
52
14
0
11 Mar 2023
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test Questions
He Zhu
Xihua Li
Xuemin Zhao
Yunbo Cao
Shan Yu
25
0
0
09 Mar 2023
Improving Video Retrieval by Adaptive Margin
Feng He
Qi Wang
Zhifan Feng
Wenbin Jiang
Yajuan Lü
Yong Zhu
Xiao Tan
135
22
0
09 Mar 2023
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
96
27
0
09 Mar 2023
Continuity-Aware Latent Interframe Information Mining for Reliable UAV Tracking
Changhong Fu
Mutian Cai
Sihang Li
Kunhan Lu
Haobo Zuo
Chongjun Liu
89
5
0
08 Mar 2023
Continuous Sign Language Recognition with Correlation Network
Lianyu Hu
Liqing Gao
Zekang Liu
Wei Feng
SLR
114
67
0
06 Mar 2023
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition
Junyan Wang
Zhenhong Sun
Yichen Qian
Dong Gong
Xiuyu Sun
Ming Lin
Maurice Pagnucco
Yang Song
3DPC
59
11
0
05 Mar 2023
Temporal Coherent Test-Time Optimization for Robust Video Classification
Chenyu Yi
Siyuan Yang
Yufei Wang
Haoliang Li
Yap-Peng Tan
Alex C. Kot
TTA
82
13
0
28 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
110
37
0
27 Feb 2023
Deep Learning for Video-Text Retrieval: a Review
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
75
18
0
24 Feb 2023
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training
Weihong Zhong
Mao Zheng
Duyu Tang
Xuan Luo
Heng Gong
Xiaocheng Feng
Bing Qin
103
8
0
20 Feb 2023
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer
N. H. Phong
B. Ribeiro
68
17
0
17 Feb 2023
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
C. Nwoye
Tong Yu
Saurav Sharma
Aditya Murali
Deepak Alapatt
...
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
Didier Mutter
N. Padoy
102
20
0
13 Feb 2023
Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer
Min Peng
Chongyang Wang
Yu Shi
Xiang-Dong Zhou
ViT
82
7
0
04 Feb 2023
Learning Large-scale Neural Fields via Context Pruned Meta-Learning
Jihoon Tack
Subin Kim
Sihyun Yu
Jaeho Lee
Jinwoo Shin
Jonathan Richard Schwarz
87
9
0
01 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
Yizhen Chen
Jie Wang
Lijian Lin
Zhongang Qi
Jin Ma
Ying Shan
VLM
85
23
0
30 Jan 2023
Semi-Parametric Video-Grounded Text Generation
Sungdong Kim
Jin-Hwa Kim
Jiyoung Lee
Minjoon Seo
VGen
80
14
0
27 Jan 2023
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
Ruyang Liu
Jingjia Huang
Ge Li
Jiashi Feng
Xing Wu
Thomas H. Li
AI4TS
CLIP
VLM
112
48
0
26 Jan 2023
Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using a New Frame Selection Policy and Gating Mechanism
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
48
4
0
18 Jan 2023
Building Scalable Video Understanding Benchmarks through Sports
Aniket Agarwal
Alex Zhang
Karthik Narasimhan
Igor Gilitschenski
Vishvak Murahari
Yash Kant
55
1
0
17 Jan 2023
TinyHD: Efficient Video Saliency Prediction with Heterogeneous Decoders using Hierarchical Maps Distillation
Feiyan Hu
S. Palazzo
Federica Proietto Salanitri
Giovanni Bellitto
Morteza Moradi
C. Spampinato
Kevin McGuinness
65
10
0
11 Jan 2023
Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification Dataset using Manipulating Conditional Style Translation
Hilmil Pradana
Minh-Son Dao
K. Zettsu
59
5
0
06 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
113
59
0
05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
105
4
0
05 Jan 2023
Test of Time: Instilling Video-Language Models with a Sense of Time
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
183
37
0
05 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Guohao Li
AAML
127
9
0
03 Jan 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
174
53
0
31 Dec 2022
An end-to-end multi-scale network for action prediction in videos
Xiaofan Liu
Jianqin Yin
Yuanxi Sun
Zhicheng Zhang
Jin Tang
59
0
0
31 Dec 2022
StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition
Xi Shen
Zhedong Zheng
Yi Yang
SLR
95
14
0
25 Dec 2022
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
J. Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
SSL
87
6
0
21 Dec 2022
MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning
Yuan Liu
Jiacheng Chen
Hao Wu
87
2
0
21 Dec 2022
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
Sangwon Kim
Dasom Ahn
ByoungChul Ko
ViT
3DPC
71
26
0
12 Dec 2022
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
125
81
0
09 Dec 2022
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
68
51
0
09 Dec 2022
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
84
2
0
09 Dec 2022
DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition
Santosh Kumar Yadav
Achleshwar Luthra
Esha Pahwa
K. Tiwari
Heena Rathore
Hari Mohan Pandey
Peter Corcoran
78
14
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
84
58
0
06 Dec 2022
Previous
1
2
3
4
5
...
12
13
14
Next