Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
v1
v2
v3 (latest)
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 3,647 papers shown
Title
Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition
Xiaoguang Zhu
Ye Zhu
Haoyu Wang
Honglin Wen
Yan Yan
Peilin Liu
101
28
0
23 Feb 2022
Movies2Scenes: Using Movie Metadata to Learn Scene Representation
Shixing Chen
Chundi Liu
Xiang Hao
Xiaohan Nie
Maxim Arap
Raffay Hamid
69
17
0
22 Feb 2022
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Yoshihiro Yamazaki
Shota Orihashi
Ryo Masumura
Mihiro Uchida
Akihiko Takashima
45
8
0
21 Feb 2022
Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study
Yuecong Xu
Jianfei Yang
Haozhi Cao
Jianxiong Yin
Zhenghua Chen
Xiaoli Li
Zhengguo Li
Qiaoqiao Xu
83
2
0
19 Feb 2022
Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses
Phoebe K. Chua
D. Makris
Dorien Herremans
Gemma Roig
Design
87
9
0
19 Feb 2022
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
A. Cherian
Chiori Hori
Tim K. Marks
Jonathan Le Roux
113
38
0
18 Feb 2022
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
186
228
0
18 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
59
3
0
16 Feb 2022
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
129
342
0
16 Feb 2022
HAKE: A Knowledge Engine Foundation for Human Activity Understanding
Yong-Lu Li
Xinpeng Liu
Xiaoqian Wu
Yizhuo Li
Zuoyu Qiu
Liang Xu
Yue Xu
Haoshu Fang
Cewu Lu
99
38
0
14 Feb 2022
Adaptive Graph Convolutional Networks for Weakly Supervised Anomaly Detection in Videos
Congqi Cao
Xin Zhang
Shizhou Zhang
Peng Wang
Yanning Zhang
AI4TS
107
23
0
14 Feb 2022
Robust Deepfake On Unrestricted Media: Generation And Detection
Trung-Nghia Le
H. Nguyen
Junichi Yamagishi
Isao Echizen
97
8
0
13 Feb 2022
Learning Temporal Rules from Noisy Timeseries Data
Karan Samel
Zelin Zhao
Binghong Chen
Shuang Li
D. Subramanian
Irfan Essa
Le Song
NoLa
NAI
AI4TS
AI4CE
24
2
0
11 Feb 2022
Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks
Nan Wu
Stanislaw Jastrzebski
Kyunghyun Cho
Krzysztof J. Geras
77
76
0
10 Feb 2022
Joint-bone Fusion Graph Convolutional Network for Semi-supervised Skeleton Action Recognition
Zhigang Tu
Jiaxu Zhang
Hongyan Li
Yujin Chen
Junsong Yuan
86
86
0
08 Feb 2022
A Coding Framework and Benchmark towards Compressed Video Understanding
Yuan Tian
Guo Lu
Yichao Yan
Guangtao Zhai
Lixing Chen
Zhiyong Gao
83
25
0
06 Feb 2022
Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action Recognition
Lipeng Ke
Kuan-Chuan Peng
Siwei Lyu
3DPC
67
34
0
04 Feb 2022
Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention Model
Hamid Reza Mohammadi
Ehsan Nazerfard
106
25
0
04 Feb 2022
MMSys'22 Grand Challenge on AI-based Video Production for Soccer
Cise Midoglu
Steven A. Hicks
Vajira Thambawita
T. Kupka
Pål Halvorsen
VGen
92
14
0
02 Feb 2022
Should I take a walk? Estimating Energy Expenditure from Video Data
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
75
4
0
01 Feb 2022
A Dataset for Medical Instructional Video Classification and Question Answering
D. Gupta
Kush Attal
Dina Demner-Fushman
114
33
0
30 Jan 2022
TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images
Jiangyun Li
Wenxuan Wang
Chen Chen
Tianxiang Zhang
Sen Zha
Jing Wang
Hong Yu
ViT
MedIm
130
25
0
30 Jan 2022
Assessing Cross-dataset Generalization of Pedestrian Crossing Predictors
Joseph Gesnouin
Steve Pechberti
B. Stanciulescu
Fabien Moutarde
67
13
0
29 Jan 2022
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
111
87
0
26 Jan 2022
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition
Kiyoon Kim
Shreyank N. Gowda
Oisin Mac Aodha
Laura Sevilla-Lara
111
10
0
25 Jan 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
250
384
0
24 Jan 2022
vCLIMB: A Novel Video Class Incremental Learning Benchmark
Andrés Villa
Kumail Alhamoud
Juan Carlos León Alcázar
Fabian Caba Heilbron
Victor Escorcia
Guohao Li
CLL
165
33
0
23 Jan 2022
Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval
Jianfeng Dong
Yabing Wang
Xianke Chen
Xiaoye Qu
Xirong Li
Y. He
Xun Wang
87
59
0
23 Jan 2022
LTC-GIF: Attracting More Clicks on Feature-length Sports Videos
Ghulam Mujtaba
Jaehyuk Choi
Eun‐Seok Ryu
44
0
0
22 Jan 2022
VIPriors 2: Visual Inductive Priors for Data-Efficient Deep Learning Challenges
A. Lengyel
Robert-Jan Bruintjes
Marcos Baptista-Rios
O. Kayhan
Davide Zambrano
Nergis Tomen
Jan van Gemert
VLM
78
11
0
21 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
Laurens van der Maaten
Armand Joulin
Ishan Misra
292
237
0
20 Jan 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
89
170
0
20 Jan 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
3DGS
116
41
0
20 Jan 2022
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation
Yihang Li
Shuichiro Shimizu
Weiqi Gu
Chenhui Chu
Sadao Kurohashi
57
15
0
20 Jan 2022
Action Keypoint Network for Efficient Video Recognition
Xu Chen
Yahong Han
Xiaohan Wang
Yifang Sun
Yi Yang
3DPC
82
6
0
17 Jan 2022
Continual Transformers: Redundancy-Free Attention for Online Inference
Lukas Hedegaard
Arian Bakhtiarnia
Alexandros Iosifidis
CLL
110
12
0
17 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
143
107
0
16 Jan 2022
Towards Zero-shot Sign Language Recognition
Yunus Can Bilge
R. G. Cinbis
Nazli Ikizler-Cinbis
SLR
58
36
0
15 Jan 2022
Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow
Kaihong Wang
Kumar Akash
Teruhisa Misu
83
11
0
15 Jan 2022
Transformers in Action: Weakly Supervised Action Segmentation
John Ridley
Huseyin Coskun
D. Tan
Nassir Navab
F. Tombari
ViT
48
5
0
14 Jan 2022
Hand-Object Interaction Reasoning
Jian Ma
Dima Damen
71
7
0
13 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
145
254
0
12 Jan 2022
OCSampler: Compressing Videos to One Clip with Single-step Sampling
Jintao Lin
Haodong Duan
Kai-xiang Chen
Dahua Lin
Limin Wang
78
24
0
12 Jan 2022
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
99
221
0
12 Jan 2022
Motion-Focused Contrastive Learning of Video Representations
Rui Li
Yiheng Zhang
Zhaofan Qiu
Ting Yao
Dong Liu
Tao Mei
SSL
92
35
0
11 Jan 2022
Representing Videos as Discriminative Sub-graphs for Action Recognition
Dong Li
Zhaofan Qiu
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
108
26
0
11 Jan 2022
Boosting Video Representation Learning with Multi-Faceted Integration
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Xiaoping Zhang
Dong Wu
Tao Mei
63
9
0
11 Jan 2022
Condensing a Sequence to One Informative Frame for Video Recognition
Zhaofan Qiu
Ting Yao
Y. Shu
Chong-Wah Ngo
Tao Mei
152
9
0
11 Jan 2022
Optimization Planning for 3D ConvNets
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
3DPC
3DH
91
9
0
11 Jan 2022
TSA-Net: Tube Self-Attention Network for Action Quality Assessment
Shunli Wang
Dingkang Yang
Peng Zhai
Chixiao Chen
Lihua Zhang
ViT
81
70
0
11 Jan 2022
Previous
1
2
3
...
38
39
40
...
71
72
73
Next