Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
v1
v2
v3 (latest)
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 3,647 papers shown
Title
An Empirical Study of End-to-End Temporal Action Detection
Xiaolong Liu
S. Bai
Xiang Bai
96
60
0
06 Apr 2022
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zerui Li
Cheng Lu
Jia Qin
Chunle Guo
Mingg-Ming Cheng
110
153
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
99
22
0
06 Apr 2022
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
Mingfei Han
David Junhao Zhang
Yali Wang
Rui Yan
L. Yao
Xiaojun Chang
Yu Qiao
72
56
0
05 Apr 2022
Detector-Free Weakly Supervised Group Activity Recognition
Dongkeun Kim
Jin S. Lee
Minsu Cho
Suha Kwak
ViT
77
44
0
05 Apr 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos
Shao-Wei Liu
Subarna Tripathi
Somdeb Majumdar
Xiaolong Wang
EgoV
115
97
0
04 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
140
106
0
04 Apr 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
120
94
0
04 Apr 2022
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding
Ziyue Wu
Junyu Gao
Shucheng Huang
Changsheng Xu
84
4
0
04 Apr 2022
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Huazhang Hu
Sixun Dong
Yiqun Zhao
Dongze Lian
Zhengxin Li
Shenghua Gao
89
52
0
03 Apr 2022
Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
118
46
0
01 Apr 2022
Vision Transformer with Cross-attention by Temporal Shift for Efficient Action Recognition
Ryota Hashiguchi
Toru Tamaki
47
6
0
01 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
117
25
0
01 Apr 2022
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
Junyu Gao
Mengyuan Chen
Changsheng Xu
62
71
0
31 Mar 2022
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
Feng Cheng
Ming Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Li
Wei Xia
63
17
0
31 Mar 2022
Controllable Augmentations for Video Representation Learning
Rui Qian
Weiyao Lin
John See
Dian Li
SSL
AI4TS
54
10
0
30 Mar 2022
CycDA: Unsupervised Cycle Domain Adaptation from Image to Video
Wei Lin
Anna Kukleva
Kunyang Sun
Horst Possegger
Hilde Kuehne
Horst Bischof
VGen
135
7
0
30 Mar 2022
StyleFool: Fooling Video Classification Systems via Style Transfer
Yu Cao
Xi Xiao
Ruoxi Sun
Derui Wang
Minhui Xue
Sheng Wen
AAML
131
26
0
30 Mar 2022
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation
Guang Feng
Lihe Zhang
Zhiwei Hu
Huchuan Lu
VOS
116
4
0
30 Mar 2022
Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification
Shi Pu
Kaili Zhao
Mao Zheng
VLM
76
20
0
29 Mar 2022
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection
Congcong Li
Xinyao Wang
Longyin Wen
Dexiang Hong
Tiejian Luo
Libo Zhang
78
17
0
29 Mar 2022
SPAct: Self-supervised Privacy Preservation for Action Recognition
I. Dave
Chong Chen
M. Shah
PICV
74
59
0
29 Mar 2022
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Bo He
Xitong Yang
Le Kang
Zhiyu Cheng
Xingfa Zhou
Abhinav Shrivastava
79
81
0
29 Mar 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
Anthony L. Caterini
Animesh Garg
Guangwei Yu
101
162
0
28 Mar 2022
Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning
Minghao Chen
Fangyun Wei
Chong Li
Deng Cai
AI4TS
105
35
0
28 Mar 2022
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
R Gnana Praveen
W. Melo
Nasib Ullah
Haseeb Aslam
Osama Zeeshan
...
M. Pedersoli
Alessandro Lameiras Koerich
Simon L Bacon
P. Cardinal
Eric Granger
122
71
0
28 Mar 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
100
221
0
28 Mar 2022
LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point Clouds
Jialian Li
Jingyi Zhang
Zhiyong Wang
Siqi Shen
Chenglu Wen
Yuexin Ma
Lan Xu
Jingyi Yu
Cheng-i Wang
3DPC
114
33
0
28 Mar 2022
Discovering Human-Object Interaction Concepts via Self-Compositional Learning
Zhi Hou
Baosheng Yu
Dacheng Tao
92
19
0
27 Mar 2022
Audio-Adaptive Activity Recognition Across Video Domains
Yun C. Zhang
Hazel Doughty
Ling Shao
Cees G. M. Snoek
75
42
0
27 Mar 2022
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
Muheng Li
Lei Chen
Yueqi Duan
Zhilan Hu
Jianjiang Feng
Jie Zhou
Jiwen Lu
79
76
0
26 Mar 2022
Class-Incremental Learning for Action Recognition in Videos
Jaeyoo Park
Minsoo Kang
Bohyung Han
CLL
84
52
0
25 Mar 2022
Learning to Adapt to Unseen Abnormal Activities under Weak Supervision
Jaeyoo Park
Junha Kim
Bohyung Han
OffRL
67
5
0
25 Mar 2022
Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
Reza Ghoddoosian
Isht Dwivedi
Nakul Agarwal
Chiho Choi
Behzad Dariush
69
19
0
24 Mar 2022
Movie Genre Classification by Language Augmentation and Shot Sampling
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
Xin Miao
Jiayi Liu
Huayan Wang
VLM
CLIP
70
1
0
24 Mar 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Leilei Gan
Yi Yang
Yueting Zhuang
Xinze Wang
103
75
0
24 Mar 2022
Interpretable Prediction of Pulmonary Hypertension in Newborns using Echocardiograms
H. Ragnarsdóttir
Laura Manduchi
H. Michel
F. Laumer
S. Wellmann
Ece Ozkan
Julia-Franziska Vogt
66
3
0
24 Mar 2022
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection
Hitesh Sapkota
Qi Yu
76
40
0
24 Mar 2022
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Ye Liu
Siyuan Li
Yang Wu
C. Chen
Ying Shan
Xiaohu Qie
ViT
113
151
0
23 Mar 2022
The Challenges of Continuous Self-Supervised Learning
Senthil Purushwalkam
Pedro Morgado
Abhinav Gupta
CLL
89
44
0
23 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
254
1,222
0
23 Mar 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
137
19
0
23 Mar 2022
Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection
Yu Tian
Guansong Pang
Fengbei Liu
Yuyuan Liu
Chong Wang
Yuanhong Chen
Johan Verjans
G. Carneiro
ViT
MedIm
87
29
0
23 Mar 2022
Enabling faster and more reliable sonographic assessment of gestational age through machine learning
Chace Lee
Angelica Willis
Christina W. Chen
M. Sieniek
Akib A Uddin
...
Rory Pilgrim
Katherine Chou
Daniel Tse
S. Shetty
Ryan G. Gomes
47
0
0
22 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
104
33
0
22 Mar 2022
Generative Adversarial Network for Future Hand Segmentation from Egocentric Video
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
79
14
0
21 Mar 2022
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces
Jianqi Zhong
Kaichen Zhou
Qingyong Hu
Bing Wang
Niki Trigoni
Andrew Markham
3DPC
101
23
0
21 Mar 2022
Facial Expression Analysis Using Decomposed Multiscale Spatiotemporal Networks
W. Melo
Eric Granger
Miguel Bordallo López
CVBM
86
22
0
21 Mar 2022
LocATe: End-to-end Localization of Actions in 3D with Transformers
Jiankai Sun
Bolei Zhou
Michael J. Black
Arjun Chandrasekaran
143
8
0
21 Mar 2022
FAR: Fourier Aerial Video Recognition
D. Kothandaraman
Tianrui Guan
Xijun Wang
Sean Hu
Ming-Shun Lin
Tianyi Zhou
80
13
0
21 Mar 2022
Previous
1
2
3
...
36
37
38
...
71
72
73
Next