Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
v1
v2
v3 (latest)
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 3,647 papers shown
Title
Anomaly Detection in Video Sequences: A Benchmark and Computational Model
Boyang Wan
Wenhui Jiang
Yuming Fang
Zhiyuan Luo
Guanqun Ding
AI4TS
75
48
0
16 Jun 2021
Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows
Mahdi M. Kalayeh
Nagendra Kamath
Lingyi Liu
Ashok Chandrashekar
SSL
31
2
0
16 Jun 2021
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Mateusz Malinowski
Dimitrios Vytiniotis
G. Swirszcz
Viorica Patraucean
João Carreira
65
8
0
15 Jun 2021
Multi-StyleGAN: Towards Image-Based Simulation of Time-Lapse Live-Cell Microscopy
Christoph Reich
Tim Prangemeier
C. Wildner
Heinz Koeppl
AI4CE
61
9
0
15 Jun 2021
Relation Modeling in Spatio-Temporal Action Localization
Yutong Feng
Jianwen Jiang
Ziyuan Huang
Zhiwu Qing
Xiang Wang
Shiwei Zhang
Mingqian Tang
Yue Gao
70
11
0
15 Jun 2021
A Stronger Baseline for Ego-Centric Action Detection
Zhiwu Qing
Ziyuan Huang
Xiang Wang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Mingqian Tang
Changxin Gao
M. Ang
Nong Sang
EgoV
61
3
0
13 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
161
11
0
12 Jun 2021
Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Ludan Ruan
Jieting Chen
Yuqing Song
Shizhe Chen
Qin Jin
34
0
0
11 Jun 2021
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
95
127
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
114
282
0
09 Jun 2021
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition
Ziyuan Huang
Zhiwu Qing
Xiang Wang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Zhurong Xia
Mingqian Tang
Nong Sang
M. Ang
ViT
64
11
0
09 Jun 2021
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models
Chenfeng Xu
Shijia Yang
Tomer Galanti
Bichen Wu
Xiangyu Yue
Bohan Zhai
Wei Zhan
Peter Vajda
Kurt Keutzer
Masayoshi Tomizuka
3DPC
62
55
0
08 Jun 2021
Few-Shot Action Localization without Knowing Boundaries
Tingting Xie
Christos Tzelepis
Fan Fu
Ioannis Patras
67
5
0
08 Jun 2021
Novel View Video Prediction Using a Dual Representation
Sarah Shiraz
Krishna Regmi
Shruti Vyas
Yogesh S Rawat
M. Shah
74
6
0
07 Jun 2021
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
3DV
100
46
0
07 Jun 2021
Transformed ROIs for Capturing Visual Transformations in Videos
Abhinav Rai
Fadime Sener
Angela Yao
ViT
69
3
0
06 Jun 2021
Hierarchical Video Generation for Complex Data
Lluis Castrejon
Nicolas Ballas
Aaron Courville
VGen
69
4
0
04 Jun 2021
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
96
212
0
03 Jun 2021
Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
127
12
0
03 Jun 2021
CT-Net: Channel Tensorization Network for Video Classification
Kunchang Li
Xianhang Li
Yali Wang
Jun Wang
Yu Qiao
ViT
74
55
0
03 Jun 2021
Deconfounded Video Moment Retrieval with Causal Intervention
Xun Yang
Fuli Feng
Wei Ji
Meng Wang
Tat-Seng Chua
CML
VGen
82
191
0
03 Jun 2021
TSI: Temporal Saliency Integration for Video Action Recognition
Haisheng Su
Kunchang Li
Jinyuan Feng
Dongliang Wang
Weihao Gan
Wei Wu
Yu Qiao
67
4
0
02 Jun 2021
Dual Normalization Multitasking for Audio-Visual Sounding Object Localization
Tokuhiro Nishikawa
Daiki Shimada
Jerry Jun Yokono
29
0
0
01 Jun 2021
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos
Lukas Hedegaard
Alexandros Iosifidis
3DPC
94
15
0
31 May 2021
Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
Shuai Bai
Zhedong Zheng
Xiaohan Wang
Junyang Lin
Zhu Zhang
Chang Zhou
Yi Yang
Hongxia Yang
103
27
0
31 May 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song
Shizhe Chen
Qin Jin
68
38
0
30 May 2021
Maintaining Common Ground in Dynamic Environments
Takuma Udagawa
Akiko Aizawa
48
13
0
29 May 2021
SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation
Zhe Wang
Hao Chen
Xinyu Li
Chunhui Liu
Yuanjun Xiong
Joseph Tighe
Charless C. Fowlkes
118
20
0
29 May 2021
Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
Sateesh Kumar
S. Haresh
Awais Ahmed
Andrey Konin
M. Zia
Quoc-Huy Tran
SSL
108
48
0
27 May 2021
Tracking Without Re-recognition in Humans and Machines
Drew Linsley
Girik Malik
Junkyung Kim
L. Govindarajan
E. Mingolla
Thomas Serre
69
18
0
27 May 2021
SSAN: Separable Self-Attention Network for Video Representation Learning
Xudong Guo
Xun Guo
Yan Lu
ViT
AI4TS
55
26
0
27 May 2021
Detecting Biological Locomotion in Video: A Computational Approach
Soo-Min Kang
Richard P. Wildes
45
0
0
26 May 2021
Improving Sign Language Translation with Monolingual Data by Sign Back-Translation
Hao Zhou
Wen-gang Zhou
Weizhen Qi
Junfu Pu
Houqiang Li
SLR
65
194
0
26 May 2021
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wenhao Wu
Yuxiang Zhao
Yanwu Xu
Xiao Tan
Dongliang He
...
Jinxing Ye
Yingying Li
Mingde Yao
Zichao Dong
Yifeng Shi
AI4TS
93
30
0
25 May 2021
Temporal Action Proposal Generation with Transformers
Lining Wang
Haosen Yang
Wenhao Wu
Huanjin Yao
Hujie Huang
ViT
85
28
0
25 May 2021
GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot Action Recognition
Bin Sun
Dehui Kong
Shaofan Wang
Jinghua Li
Baocai Yin
Xiaonan Luo
55
18
0
25 May 2021
ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos
Meng-Jiun Chiou
Chun-Yu Liao
Li-Wei Wang
Roger Zimmermann
Jiashi Feng
107
27
0
25 May 2021
FineAction: A Fine-Grained Video Dataset for Temporal Action Localization
Yi Liu
Limin Wang
Yali Wang
Xiao Ma
Yu Qiao
102
62
0
24 May 2021
Coarse to Fine Multi-Resolution Temporal Convolutional Network
Dipika Singhania
R. Rahaman
Angela Yao
AI4TS
85
55
0
23 May 2021
Video-based Person Re-identification without Bells and Whistles
Chih-Ting Liu
Jun-Cheng Chen
Chu-Song Chen
Shao-Yi Chien
115
15
0
22 May 2021
Sharing Pain: Using Pain Domain Transfer for Video Recognition of Low Grade Orthopedic Pain in Horses
Sofia Broomé
K. Ask
Maheen Rashid-Engström
Pia Haubro Andersen
Hedvig Kjellström
85
12
0
21 May 2021
Egocentric Activity Recognition and Localization on a 3D Map
Miao Liu
Lingni Ma
Kiran Somasundaram
Yin Li
Kristen Grauman
James M. Rehg
Chao Li
EgoV
69
20
0
20 May 2021
Medical Image Segmentation Using Squeeze-and-Expansion Transformers
Shaohua Li
Xiuchao Sui
Xiangde Luo
Xinxing Xu
Yong Liu
Rick Siow Mong Goh
ViT
MedIm
78
170
0
20 May 2021
Non-contact Pain Recognition from Video Sequences with Remote Physiological Measurements Prediction
Ruijing Yang
Ziyu Guan
Zitong Yu
Xiaoyi Feng
Jinye Peng
Guoying Zhao
37
10
0
18 May 2021
Parallel Attention Network with Sequence Matching for Video Grounding
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
109
41
0
18 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
205
507
0
18 May 2021
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living
Srijan Das
Rui Dai
Di Yang
Francois Bremond
ViT
104
70
0
17 May 2021
Leveraging Semantic Scene Characteristics and Multi-Stream Convolutional Architectures in a Contextual Approach for Video-Based Visual Emotion Recognition in the Wild
Ioannis Pikoulis
P. Filntisis
Petros Maragos
89
14
0
16 May 2021
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Yixuan Li
Lei Chen
Runyu He
Zhenzhi Wang
Gangshan Wu
Limin Wang
127
100
0
16 May 2021
Cross-Modal Progressive Comprehension for Referring Segmentation
Si Liu
Tianrui Hui
Shaofei Huang
Yunchao Wei
Yue Liu
Guanbin Li
EgoV
VOS
86
130
0
15 May 2021
Previous
1
2
3
...
46
47
48
...
71
72
73
Next