Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 1,478 papers shown
Title
Temporal Action Localization Using Gated Recurrent Units
Hassan Keshvari Khojasteh
Hoda Mohammadzade
H. Behroozi
23
3
0
07 Aug 2021
Token Shift Transformer for Video Classification
Hao Zhang
Y. Hao
Chong-Wah Ngo
ViT
32
116
0
05 Aug 2021
Hybrid Reasoning Network for Video-based Commonsense Captioning
Weijiang Yu
Jian Liang
Lei Ji
Lu Li
Yuejian Fang
Nong Xiao
Nan Duan
24
10
0
05 Aug 2021
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Rui Qian
Yuxi Li
Huabin Liu
John See
Shuangrui Ding
Xian Liu
Dian Li
Weiyao Lin
35
42
0
04 Aug 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Chiori Hori
Takaaki Hori
Jonathan Le Roux
25
4
0
04 Aug 2021
Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
Siyuan Yang
Jun Liu
Shijian Lu
Meng Hwa Er
Alex C. Kot
3DH
3DPC
43
91
0
04 Aug 2021
OncoNet: Weakly Supervised Siamese Network to automate cancer treatment response assessment between longitudinal FDG PET/CT examinations
Anirudh Joshi
Sabri Eyuboglu
Shih-Cheng Huang
Jared A. Dunnmon
Arjun Soin
G. Davidzon
Akshay S. Chaudhari
M. Lungren
14
3
0
03 Aug 2021
Video Generation from Text Employing Latent Path Construction for Temporal Modeling
Amir Mazaheri
M. Shah
30
8
0
29 Jul 2021
Insights from Generative Modeling for Neural Video Compression
Ruihan Yang
Yibo Yang
Joseph Marino
Stephan Mandt
VGen
35
16
0
28 Jul 2021
Enriching Local and Global Contexts for Temporal Action Localization
Zixin Zhu
Wei Tang
Le Wang
N. Zheng
G. Hua
29
108
0
27 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
24
122
0
26 Jul 2021
HANet: Hierarchical Alignment Networks for Video-Text Retrieval
Peng Wu
Xiangteng He
Mingqian Tang
Yiliang Lv
Jing Liu
42
52
0
26 Jul 2021
Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
Abhishek Aich
Meng Zheng
Srikrishna Karanam
Terrence Chen
Amit K. Roy-Chowdhury
Ziyan Wu
42
70
0
25 Jul 2021
Adaptive Recursive Circle Framework for Fine-grained Action Recognition
Hanxi Lin
Xinxiao Wu
Jiebo Luo
30
1
0
25 Jul 2021
EAN: Event Adaptive Network for Enhanced Action Recognition
Yuan Tian
Yichao Yan
Guangtao Zhai
G. Guo
Zhiyong Gao
40
41
0
22 Jul 2021
Evidential Deep Learning for Open Set Action Recognition
Wentao Bao
Qi Yu
Yu Kong
CML
EDL
26
135
0
21 Jul 2021
Multi-modal Residual Perceptron Network for Audio-Video Emotion Recognition
Xin Chang
W. Skarbek
30
19
0
21 Jul 2021
UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition
Di Yang
Yaohui Wang
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
Francois Bremond
32
47
0
19 Jul 2021
Is attention to bounding boxes all you need for pedestrian action prediction?
Lina Achaji
Julien Moreau
Thibault Fouqueray
François Aioun
François Charpillet
23
30
0
16 Jul 2021
End-to-end Multi-modal Video Temporal Grounding
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
11
51
0
12 Jul 2021
Aligning Correlation Information for Domain Adaptation in Action Recognition
Yuecong Xu
Jianfei Yang
Haozhi Cao
K. Mao
Jianxiong Yin
Simon See
24
38
0
11 Jul 2021
TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition
Shuyuan Li
Huabin Liu
Rui Qian
Yuxi Li
John See
Mengjuan Fei
Xiaoyuan Yu
W. Lin
28
75
0
10 Jul 2021
Long Short-Term Transformer for Online Action Detection
Mingze Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Xia
Zhuowen Tu
Stefano Soatto
ViT
40
130
0
07 Jul 2021
Do Different Tracking Tasks Require Different Appearance Models?
Zhongdao Wang
Hengshuang Zhao
Yali Li
Shengjin Wang
Philip Torr
Luca Bertinetto
42
82
0
05 Jul 2021
Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition
Vittorio Mazzia
Simone Angarano
Francesco Salvetti
Federico Angelini
Marcello Chiaberge
ViT
38
137
0
01 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
59
95
0
01 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
48
544
0
30 Jun 2021
Spatio-Temporal Context for Action Detection
Manuel Sarmiento Calderó
David Varas
Elisenda Bou
27
2
0
29 Jun 2021
Can An Image Classifier Suffice For Action Recognition?
Quanfu Fan
Chun-Fu Chen
Chen
Yikang Shen
ViT
36
33
0
26 Jun 2021
Detection of Deepfake Videos Using Long Distance Attention
Wei Lu
Lingyi Liu
Junwei Luo
Xianfeng Zhao
Yicong Zhou
Jiwu Huang
CVBM
32
22
0
24 Jun 2021
IA-RED
2
^2
2
: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Yikang Shen
Yi Ding
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
44
156
0
23 Jun 2021
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
59
166
0
21 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
37
127
0
21 Jun 2021
Does Optimal Source Task Performance Imply Optimal Pre-training for a Target Task?
Steven Gutstein
Brent Lance
Sanjay Shakkottai
27
1
0
21 Jun 2021
OadTR: Online Action Detection with Transformers
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Yuanjie Shao
Zhe Zuo
Changxin Gao
Nong Sang
OffRL
ViT
34
110
0
21 Jun 2021
Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling
Xiang Wang
Zhiwu Qing
Ziyuan Huang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Mingqian Tang
Yuanjie Shao
Nong Sang
29
4
0
20 Jun 2021
Video Summarization through Reinforcement Learning with a 3D Spatio-Temporal U-Net
Tianrui Liu
Qingjie Meng
Jun-Jie Huang
Athanasios Vlontzos
Daniel Rueckert
Bernhard Kainz
OffRL
AI4TS
26
70
0
19 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
18
53
0
19 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
40
11
0
18 Jun 2021
EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report
Lijin Yang
Yifei Huang
Yusuke Sugano
Yoichi Sato
16
5
0
18 Jun 2021
Relation Modeling in Spatio-Temporal Action Localization
Yutong Feng
Jianwen Jiang
Ziyuan Huang
Zhiwu Qing
Xiang Wang
Shiwei Zhang
Mingqian Tang
Yue Gao
33
11
0
15 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
27
11
0
12 Jun 2021
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
36
124
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
32
274
0
09 Jun 2021
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition
Ziyuan Huang
Zhiwu Qing
Xiang Wang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Zhurong Xia
Mingqian Tang
Nong Sang
M. Ang
ViT
27
11
0
09 Jun 2021
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models
Chenfeng Xu
Shijia Yang
Tomer Galanti
Bichen Wu
Xiangyu Yue
Bohan Zhai
Wei Zhan
Peter Vajda
Kurt Keutzer
Masayoshi Tomizuka
3DPC
39
53
0
08 Jun 2021
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
3DV
31
45
0
07 Jun 2021
Hierarchical Video Generation for Complex Data
Lluis Castrejon
Nicolas Ballas
Aaron Courville
VGen
22
4
0
04 Jun 2021
CT-Net: Channel Tensorization Network for Video Classification
Kunchang Li
Xianhang Li
Yali Wang
Jun Wang
Yu Qiao
ViT
30
55
0
03 Jun 2021
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos
Lukas Hedegaard
Alexandros Iosifidis
3DPC
25
14
0
31 May 2021
Previous
1
2
3
...
17
18
19
...
28
29
30
Next