Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
v1
v2
v3 (latest)
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 3,647 papers shown
Title
Spatial-temporal Concept based Explanation of 3D ConvNets
Yi Ji
Yu Wang
K. Mori
Jien Kato
3DPC
FAtt
92
7
0
09 Jun 2022
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
Zihan Ding
Tianrui Hui
Junshi Huang
Xiaoming Wei
Jizhong Han
Si Liu
VOS
73
55
0
08 Jun 2022
Generating Long Videos of Dynamic Scenes
Tim Brooks
Janne Hellsten
M. Aittala
Ting-Chun Wang
Timo Aila
J. Lehtinen
Xuan Li
Alexei A. Efros
Tero Karras
SyDa
104
114
0
07 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
96
115
0
07 Jun 2022
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector
Lin Sui
Chen-Da Liu-Zhang
Lixin Gu
Feng Han
143
8
0
07 Jun 2022
TadML: A fast temporal action detection with Mechanics-MLP
Bowen Deng
Dongchang Liu
83
1
0
07 Jun 2022
A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information
M. Kowal
Mennatullah Siam
Md. Amirul Islam
Neil D. B. Bruce
Richard P. Wildes
Konstantinos G. Derpanis
70
26
0
06 Jun 2022
3D Convolutional with Attention for Action Recognition
Labina Shrestha
Shikha Dubey
Farrukh Olimov
M. Rafique
M. Jeon
38
0
0
05 Jun 2022
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Xudong Lin
Simran Tiwari
Shiyuan Huang
Manling Li
Mike Zheng Shou
Heng Ji
Shih-Fu Chang
138
21
0
05 Jun 2022
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation
Mingjie Li
Wenjia Cai
Karin Verspoor
Shirui Pan
Xiaodan Liang
Xiaojun Chang
MedIm
88
38
0
04 Jun 2022
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
102
166
0
03 Jun 2022
Egocentric Video-Language Pretraining
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Guohao Li
Wei Liu
Mike Zheng Shou
VLM
EgoV
104
207
0
03 Jun 2022
Anomaly detection in surveillance videos using transformer based attention model
Kapil Deshpande
Narinder Singh Punn
S. K. Sonbhadra
Sonali Agarwal
ViT
AI4TS
74
12
0
03 Jun 2022
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViT
OOD
MedIm
181
46
0
02 Jun 2022
A temporal chrominance trigger for clean-label backdoor attack against anti-spoof rebroadcast detection
Wei Guo
B. Tondi
Mauro Barni
AAML
66
13
0
02 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
95
57
0
02 Jun 2022
Cascaded Video Generation for Videos In-the-Wild
Lluis Castrejon
Nicolas Ballas
Aaron Courville
VGen
86
0
0
01 Jun 2022
Dual-stream spatiotemporal networks with feature sharing for monitoring animals in the home cage
Ezechukwu I. Nwokedi
R. Bains
L. Bidaut
Xujiong Ye
Sara Wells
James M. Brown
79
2
0
01 Jun 2022
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering
Jiangtong Li
Li Niu
Liqing Zhang
67
53
0
30 May 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
389
633
0
29 May 2022
Micro-Expression Recognition Based on Attribute Information Embedding and Cross-modal Contrastive Learning
Yanxing Song
Jianzong Wang
Tianbo Wu
Zhangcheng Huang
Jing Xiao
CVBM
131
2
0
29 May 2022
Future Transformer for Long-term Action Anticipation
Dayoung Gong
Joonseok Lee
Manjin Kim
S. Ha
Minsu Cho
AI4TS
53
66
0
27 May 2022
PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences
Hehe Fan
Xin Yu
Yuhang Ding
Yi Yang
Mohan Kankanhalli
3DPC
190
113
0
27 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
259
706
0
26 May 2022
Do we really need temporal convolutions in action segmentation?
Dazhao Du
Fuchun Sun
Yu Li
Zhongang Qi
Hui Xiong
Ying Shan
ViT
72
17
0
26 May 2022
Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Manuel Traub
S. Otte
Tobias Menge
Matthias Karlbauer
Jannik Thummel
Martin Volker Butz
115
20
0
26 May 2022
VIDI: A Video Dataset of Incidents
Duygu Sesver
Alp Eren Gençoglu
Ç. Yildiz
Zehra Günindi
Faeze Habibi
Z. A. Yazici
H. K. Ekenel
68
4
0
26 May 2022
You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos
Xin Sun
Xinyu Wang
Jialin Gao
Qiong Liu
Xiaoping Zhou
96
34
0
25 May 2022
Detection of Fights in Videos: A Comparison Study of Anomaly Detection and Action Recognition
Weijun Tan
Jingfeng Liu
71
8
0
23 May 2022
Deep Learning for Visual Speech Analysis: A Survey
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Y. Guo
Xin Xu
M. Pietikäinen
Li Liu
VLM
98
36
0
22 May 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
125
59
0
22 May 2022
Structured Attention Composition for Temporal Action Localization
Le Yang
Junwei Han
Tao Zhao
Nian Liu
Dingwen Zhang
84
17
0
20 May 2022
Cross-Enhancement Transformer for Action Segmentation
Jiahui Wang
Zhenyou Wang
Shanna Zhuang
Hui Wang
ViT
97
23
0
19 May 2022
PYSKL: Towards Good Practices for Skeleton Action Recognition
Haodong Duan
Jiaqi Wang
Kai-xiang Chen
Dahua Lin
VLM
88
147
0
19 May 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
202
62
0
17 May 2022
Learnable Optimal Sequential Grouping for Video Scene Detection
Daniel Rotman
Yevgeny Yaroker
Elad Amrani
Udi Barzelay
Rami Ben-Ari
35
10
0
17 May 2022
ETAD: Training Action Detection End to End on a Laptop
Shuming Liu
Mengmeng Xu
Chen Zhao
Xu Zhao
Guohao Li
78
7
0
14 May 2022
Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild
Fuyan Ma
Bin Sun
Shutao Li
ViT
57
31
0
10 May 2022
Scaling up sign spotting through sign language dictionaries
Gül Varol
Liliane Momeni
Samuel Albanie
Triantafyllos Afouras
Andrew Zisserman
71
15
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
94
128
0
08 May 2022
Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study
Liqun Lin
Zheng Wang
Jiachen He
Weiling Chen
Yiwen Xu
Tiesong Zhao
84
7
0
07 May 2022
Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement
Bing Li
Jiaxin Chen
Dongming Zhang
Xiuguo Bao
Di Huang
56
15
0
07 May 2022
An Empirical Study on Activity Recognition in Long Surgical Videos
Zhuohong He
A. Mottaghi
Aidean Sharghi
Muhammad Abdullah Jamal
Omid Mohareri
90
12
0
05 May 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Mingdong Yang
Guo Chen
Yin-Dong Zheng
Tong Lu
Limin Wang
100
48
0
05 May 2022
Deep Neural Network approaches for Analysing Videos of Music Performances
F. Liwicki
Richa Upadhyay
Prakash Chandra Chhipa
Killian Murphy
F. Visi
S. Östersjö
Marcus Liwicki
64
1
0
05 May 2022
ANUBIS: Skeleton Action Recognition Dataset, Review, and Benchmark
Zhenyue Qin
Yang Liu
Madhawa Perera
Tom Gedeon
Pan Ji
Dongwoo Kim
Saeed Anwar
60
4
0
04 May 2022
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
Haodong Duan
Nanxuan Zhao
Kai-xiang Chen
Dahua Lin
ViT
AI4TS
82
19
0
04 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
74
0
0
03 May 2022
Cross-modal Representation Learning for Zero-shot Action Recognition
Chung-Ching Lin
Kevin Qinghong Lin
Linjie Li
Lijuan Wang
Zicheng Liu
ViT
62
29
0
03 May 2022
Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Qinying Liu
Zilei Wang
Ruoxi Chen
Zhilin Li
74
4
0
01 May 2022
Previous
1
2
3
...
34
35
36
...
71
72
73
Next