Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 1,491 papers shown
Title
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation
Yihang Li
Shuichiro Shimizu
Weiqi Gu
Chenhui Chu
Sadao Kurohashi
27
13
0
20 Jan 2022
Action Keypoint Network for Efficient Video Recognition
Xu Chen
Yahong Han
Xiaohan Wang
Yifang Sun
Yi Yang
3DPC
32
6
0
17 Jan 2022
Continual Transformers: Redundancy-Free Attention for Online Inference
Lukas Hedegaard
Arian Bakhtiarnia
Alexandros Iosifidis
CLL
32
11
0
17 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
29
103
0
16 Jan 2022
Towards Zero-shot Sign Language Recognition
Yunus Can Bilge
R. G. Cinbis
Nazli Ikizler-Cinbis
SLR
17
36
0
15 Jan 2022
Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow
Kaihong Wang
Kumar Akash
Teruhisa Misu
33
9
0
15 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
52
238
0
12 Jan 2022
OCSampler: Compressing Videos to One Clip with Single-step Sampling
Jintao Lin
Haodong Duan
Kai-xiang Chen
Dahua Lin
Limin Wang
49
24
0
12 Jan 2022
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
26
212
0
12 Jan 2022
Motion-Focused Contrastive Learning of Video Representations
Rui Li
Yiheng Zhang
Zhaofan Qiu
Ting Yao
Dong Liu
Tao Mei
SSL
39
34
0
11 Jan 2022
Representing Videos as Discriminative Sub-graphs for Action Recognition
Dong Li
Zhaofan Qiu
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
44
26
0
11 Jan 2022
TSA-Net: Tube Self-Attention Network for Action Quality Assessment
Shunli Wang
Dingkang Yang
Peng Zhai
Chixiao Chen
Lihua Zhang
ViT
37
63
0
11 Jan 2022
The State of Aerial Surveillance: A Survey
Kien Nguyen Thanh
Clinton Fookes
Sridha Sridharan
Yingli Tian
Feng Liu
Xiaoming Liu
Arun Ross
45
23
0
09 Jan 2022
Sign Language Video Retrieval with Free-Form Textual Queries
A. Duarte
Samuel Albanie
Xavier Giró-i-Nieto
Gül Varol
SLR
53
29
0
07 Jan 2022
Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO
Astrid Orcesi
Romaric Audigier
Fritz Poka Toukam
B. Luvison
26
3
0
07 Jan 2022
Cross-Modality Deep Feature Learning for Brain Tumor Segmentation
Dingwen Zhang
Guohai Huang
Qiang Zhang
Jungong Han
Junwei Han
Yizhou Yu
25
217
0
07 Jan 2022
Advancing 3D Medical Image Analysis with Variable Dimension Transform based Supervised 3D Pre-training
Shu Zhen Zhang
Zihao Li
Hong-Yu Zhou
Jiechao Ma
Yizhou Yu
37
11
0
05 Jan 2022
Exploring Motion and Appearance Information for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Pan Zhou
Yang Liu
37
41
0
03 Jan 2022
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Xing Di
Yu Cheng
Zichuan Xu
Pan Zhou
43
58
0
03 Jan 2022
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
Ivan Skorokhodov
Sergey Tulyakov
Mohamed Elhoseiny
VGen
45
279
0
29 Dec 2021
Extended Self-Critical Pipeline for Transforming Videos to Text (TRECVID-VTT Task 2021) -- Team: MMCUniAugsburg
Philipp Harzig
Moritz Einfalt
K. Ludwig
Rainer Lienhart
ViT
25
0
0
28 Dec 2021
Cross Modal Retrieval with Querybank Normalisation
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
32
84
0
23 Dec 2021
3D Skeleton-based Few-shot Action Recognition with JEANIE is not so Naïve
Lei Wang
Jun Liu
Piotr Koniusz
42
20
0
23 Dec 2021
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
Sofia Broomé
Ernest Pokropek
Boyu Li
Hedvig Kjellström
23
7
0
22 Dec 2021
Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Alaaeldin El-Nouby
Gautier Izacard
Hugo Touvron
Ivan Laptev
Hervé Jégou
Edouard Grave
SSL
27
150
0
20 Dec 2021
Precondition and Effect Reasoning for Action Recognition
Hongsang Yoo
Haopeng Li
Qiuhong Ke
Liangchen Liu
Rui Zhang
CML
51
4
0
19 Dec 2021
Tell me what you see: A zero-shot action recognition method based on natural language descriptions
Valter Estevam
Rayson Laroca
David Menotti
Hélio Pedrini
43
13
0
18 Dec 2021
Adversarial Memory Networks for Action Prediction
Zhiqiang Tao
Yue Bai
Handong Zhao
Sheng Li
Yuanyuan Kong
Y. Fu
GAN
18
2
0
18 Dec 2021
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition
Yinghao Xu
Fangyun Wei
Xiao Sun
Ceyuan Yang
Yujun Shen
Bo Dai
Bolei Zhou
Stephen Lin
VLM
33
52
0
17 Dec 2021
Distillation of Human-Object Interaction Contexts for Action Recognition
Muna Almushyti
Frederick W. Li
39
3
0
17 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
100
655
0
16 Dec 2021
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition
Benjia Zhou
Pichao Wang
Jun Wan
Yanyan Liang
Fan Wang
Du Zhang
Zhen Lei
Hao Li
Rong Jin
41
29
0
16 Dec 2021
Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos
Pengfei Pei
Xianfeng Zhao
Yun Cao
Jinchuan Li
Xiaowei Yi
ViT
34
8
0
15 Dec 2021
Temporal Action Proposal Generation with Background Constraint
Haosen Yang
Wenhao Wu
Lining Wang
Sheng Jin
Boyang Xia
Huanjin Yao
Hujie Huang
23
27
0
15 Dec 2021
Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks
Jaehui Hwang
Huan Zhang
Jun-Ho Choi
Cho-Jui Hsieh
Jong-Seok Lee
AAML
21
5
0
15 Dec 2021
A real-time spatiotemporal AI model analyzes skill in open surgical videos
E. Goodman
Krishna K. Patel
Yilun Zhang
William Locke
C. Kennedy
...
Maren Downing
Hechang Chen
Jevin Z. Clark
G. Brat
Serena Yeung
27
21
0
14 Dec 2021
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
28
54
0
14 Dec 2021
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
36
17
0
13 Dec 2021
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
35
111
0
12 Dec 2021
Self-supervised Spatiotemporal Representation Learning by Exploiting Video Continuity
Hanwen Liang
N. Quader
Zhixiang Chi
Lizhe Chen
Peng Dai
Juwei Lu
Yang Wang
SSL
AI4TS
40
30
0
11 Dec 2021
Cross-Modal Transferable Adversarial Attacks from Images to Videos
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu-Gang Jiang
AAML
32
38
0
10 Dec 2021
Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision
Liangzhe Yuan
Rui Qian
Huayu Chen
Boqing Gong
Florian Schroff
Ming-Hsuan Yang
Hartwig Adam
Ting Liu
AI4TS
30
15
0
09 Dec 2021
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Anirudh Thatipelli
Sanath Narayan
Salman Khan
Rao Muhammad Anwer
Fahad Shahbaz Khan
Guohao Li
ViT
34
88
0
09 Dec 2021
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search
Yi Ding
Xinyu Gong
Junru Wu
Humphrey Shi
Zhicheng Yan
Zhangyang Wang
VGen
52
1
0
09 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Keli Zhang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
32
21
0
09 Dec 2021
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
Rex Liu
Huan Zhang
Hamed Pirsiavash
Xin Liu
ViT
30
11
0
08 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
63
6
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
36
129
0
08 Dec 2021
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs
Kaifeng Gao
Long Chen
Yulei Niu
Jian Shao
Jun Xiao
17
29
0
08 Dec 2021
SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization
Wenbo Gou
Wen Shi
Jian Lou
Lijie Huang
Pan Zhou
Ruixuan Li
AAML
42
2
0
08 Dec 2021
Previous
1
2
3
...
14
15
16
...
28
29
30
Next