ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07750
  4. Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
v1v2v3 (latest)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017
João Carreira
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 3,647 papers shown
Title
RGB Stream Is Enough for Temporal Action Detection
RGB Stream Is Enough for Temporal Action Detection
Chenhao Wang
Hongxiang Cai
Yuxin Zou
Yichao Xiong
76
25
0
09 Jul 2021
Universal 3-Dimensional Perturbations for Black-Box Attacks on Video
  Recognition Systems
Universal 3-Dimensional Perturbations for Black-Box Attacks on Video Recognition Systems
Shangyu Xie
Han Wang
Yu Kong
Yuan Hong
AAML
75
27
0
09 Jul 2021
Long Short-Term Transformer for Online Action Detection
Long Short-Term Transformer for Online Action Detection
Mingze Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Xia
Zhuowen Tu
Stefano Soatto
ViT
154
137
0
07 Jul 2021
iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis
iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis
A. Blattmann
Timo Milbich
Michael Dorkenwald
Bjorn Ommer
DiffMVGen
80
42
0
06 Jul 2021
Do Different Tracking Tasks Require Different Appearance Models?
Do Different Tracking Tasks Require Different Appearance Models?
Zhongdao Wang
Hengshuang Zhao
Yali Li
Shengjin Wang
Philip Torr
Luca Bertinetto
111
86
0
05 Jul 2021
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Xuejiao Tang
Xin Huang
Wenbin Zhang
T. Child
Qiong Hu
Zhen Liu
Ji Zhang
LRM
81
19
0
04 Jul 2021
Action Transformer: A Self-Attention Model for Short-Time Pose-Based
  Human Action Recognition
Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition
Vittorio Mazzia
Simone Angarano
Francesco Salvetti
Federico Angelini
Marcello Chiaberge
ViT
118
143
0
01 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
169
101
0
01 Jul 2021
VideoLightFormer: Lightweight Action Recognition using Transformers
Raivo Koot
Haiping Lu
ViT
135
6
0
01 Jul 2021
PoliTO-IIT Submission to the EPIC-KITCHENS-100 Unsupervised Domain
  Adaptation Challenge for Action Recognition
PoliTO-IIT Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition
Chiara Plizzari
M. Planamente
Emanuele Alberti
Barbara Caputo
100
2
0
01 Jul 2021
iMiGUE: An Identity-free Video Dataset for Micro-Gesture Understanding
  and Emotion Analysis
iMiGUE: An Identity-free Video Dataset for Micro-Gesture Understanding and Emotion Analysis
Xin Liu
Henglin Shi
Haoyu Chen
Zitong Yu
Xiaobai Li
Guoying Zhao
96
83
0
01 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
156
576
0
30 Jun 2021
Cyclist Trajectory Forecasts by Incorporation of Multi-View Video
  Information
Cyclist Trajectory Forecasts by Incorporation of Multi-View Video Information
Stefan Zernetsch
Oliver Trupp
Viktor Kress
Konrad Doll
Bernhard Sick
41
3
0
30 Jun 2021
Word-level Sign Language Recognition with Multi-stream Neural Networks
  Focusing on Local Regions
Word-level Sign Language Recognition with Multi-stream Neural Networks Focusing on Local Regions
Mizuki Maruyama
S. Ghose
Katsufumi Inoue
P. Roy
Masakazu Iwamura
M. Yoshioka
SLR
28
18
0
30 Jun 2021
When Video Classification Meets Incremental Classes
When Video Classification Meets Incremental Classes
Hanbin Zhao
Xin Qin
Shihao Su
Yongjian Fu
Zibo Lin
Xi Li
CLL
75
28
0
30 Jun 2021
Long-Short Temporal Modeling for Efficient Action Recognition
Long-Short Temporal Modeling for Efficient Action Recognition
Liyu Wu
Yuexian Zou
Can Zhang
40
1
0
30 Jun 2021
Spatio-Temporal Context for Action Detection
Spatio-Temporal Context for Action Detection
Manuel Sarmiento Calderó
David Varas
Elisenda Bou
65
2
0
29 Jun 2021
Unsupervised Discovery of Actions in Instructional Videos
Unsupervised Discovery of Actions in Instructional Videos
A. Piergiovanni
A. Angelova
Michael S. Ryoo
Irfan Essa
36
3
0
28 Jun 2021
Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action
  Localization
Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization
Anurag Bagchi
Jazib Mahmood
Dolton Fernandes
Ravi Kiran Sarvadevabhatla
155
23
0
27 Jun 2021
Can An Image Classifier Suffice For Action Recognition?
Can An Image Classifier Suffice For Action Recognition?
Quanfu Fan
Chun-Fu Chen
Chen
Yikang Shen
ViT
95
34
0
26 Jun 2021
Exploring Temporal Context and Human Movement Dynamics for Online Action
  Detection in Videos
Exploring Temporal Context and Human Movement Dynamics for Online Action Detection in Videos
V. Vasileiou
N. Kardaris
Petros Maragos
51
2
0
26 Jun 2021
Generalized One-Class Learning Using Pairs of Complementary Classifiers
Generalized One-Class Learning Using Pairs of Complementary Classifiers
A. Cherian
Jue Wang
VLM
33
1
0
24 Jun 2021
Video Swin Transformer
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
127
1,503
0
24 Jun 2021
DROID: Driver-centric Risk Object Identification
DROID: Driver-centric Risk Object Identification
Chengxi Li
Stanley H. Chan
Yi-Ting Chen
116
8
0
24 Jun 2021
Exploring Stronger Feature for Temporal Action Localization
Exploring Stronger Feature for Temporal Action Localization
Zhiwu Qing
Xiang Wang
Ziyuan Huang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Mingqian Tang
Changxin Gao
Nong Sang
59
4
0
24 Jun 2021
Detection of Deepfake Videos Using Long Distance Attention
Detection of Deepfake Videos Using Long Distance Attention
Wei Lu
Lingyi Liu
Junwei Luo
Xianfeng Zhao
Yicong Zhou
Jiwu Huang
CVBM
70
24
0
24 Jun 2021
IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision
  Transformers
IA-RED2^22: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Yikang Shen
Yi Ding
Zhangyang Wang
Rogerio Feris
A. Oliva
VLMViT
152
165
0
23 Jun 2021
Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily
  Long Videos of Seizures
Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures
Fernando Pérez-García
C. Scott
Rachel Sparks
B. Diehl
Sébastien Ourselin
SLR
53
17
0
22 Jun 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLMViT
125
170
0
21 Jun 2021
Understanding Object Dynamics for Interactive Image-to-Video Synthesis
Understanding Object Dynamics for Interactive Image-to-Video Synthesis
A. Blattmann
Timo Milbich
Michael Dorkenwald
Bjorn Ommer
DiffMVGen
83
40
0
21 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
156
129
0
21 Jun 2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive
  Learning
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
Hao Tan
Jie Lei
Thomas Wolf
Joey Tianyi Zhou
120
67
0
21 Jun 2021
Does Optimal Source Task Performance Imply Optimal Pre-training for a
  Target Task?
Does Optimal Source Task Performance Imply Optimal Pre-training for a Target Task?
Steven Gutstein
Brent Lance
Sanjay Shakkottai
29
1
0
21 Jun 2021
OadTR: Online Action Detection with Transformers
OadTR: Online Action Detection with Transformers
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Yuanjie Shao
Zhe Zuo
Changxin Gao
Nong Sang
OffRLViT
110
117
0
21 Jun 2021
Two-Stream Consensus Network: Submission to HACS Challenge 2021
  Weakly-Supervised Learning Track
Two-Stream Consensus Network: Submission to HACS Challenge 2021 Weakly-Supervised Learning Track
Yuanhao Zhai
Le Wang
David Doermann
Junsong Yuan
38
1
0
21 Jun 2021
Weakly-Supervised Temporal Action Localization Through Local-Global
  Background Modeling
Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling
Xiang Wang
Zhiwu Qing
Ziyuan Huang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Mingqian Tang
Yuanjie Shao
Nong Sang
140
4
0
20 Jun 2021
Proposal Relation Network for Temporal Action Detection
Proposal Relation Network for Temporal Action Detection
Xiang Wang
Zhiwu Qing
Ziyuan Huang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Mingqian Tang
Changxin Gao
Nong Sang
ViT
47
25
0
20 Jun 2021
Video Summarization through Reinforcement Learning with a 3D
  Spatio-Temporal U-Net
Video Summarization through Reinforcement Learning with a 3D Spatio-Temporal U-Net
Tianrui Liu
Qingjie Meng
Jun-Jie Huang
Athanasios Vlontzos
Daniel Rueckert
Bernhard Kainz
OffRLAI4TS
69
73
0
19 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video
  Question Answering
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
82
54
0
19 Jun 2021
End-to-end Temporal Action Detection with Transformer
End-to-end Temporal Action Detection with Transformer
Xiaolong Liu
Qimeng Wang
Yao Hu
Xu Tang
Shiwei Zhang
S. Bai
X. Bai
ViT
124
234
0
18 Jun 2021
All You Can Embed: Natural Language based Vehicle Retrieval with
  Spatio-Temporal Transformers
All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers
Carmelo Scribano
D. Sapienza
Giorgia Franchini
M. Verucchi
Marko Bertogna
58
4
0
18 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream
  Prototypical Contrasting
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
110
11
0
18 Jun 2021
Discerning Generic Event Boundaries in Long-Form Wild Videos
Discerning Generic Event Boundaries in Long-Form Wild Videos
Ayush K. Rai
Tarun Krishna
J. Dietlmeier
Kevin McGuinness
Alan F. Smeaton
Noel E. O'Connor
48
5
0
18 Jun 2021
EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action
  Recognition 2021: Team M3EM Technical Report
EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report
Lijin Yang
Yifei Huang
Yusuke Sugano
Yoichi Sato
50
5
0
18 Jun 2021
PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python
PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python
Haiping Lu
Xianyuan Liu
Robert Turner
Peizhen Bai
Raivo Koot
Shuo Zhou
Mustafa Chasmai
Lawrence Schobs
40
4
0
17 Jun 2021
MaCLR: Motion-aware Contrastive Learning of Representations for Videos
MaCLR: Motion-aware Contrastive Learning of Representations for Videos
Fanyi Xiao
Joseph Tighe
Davide Modolo
SSL
68
14
0
17 Jun 2021
Long-Short Temporal Contrastive Learning of Video Transformers
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang
Gedas Bertasius
Du Tran
Lorenzo Torresani
VLMViT
153
50
0
17 Jun 2021
Unsupervised Video Prediction from a Single Frame by Estimating 3D
  Dynamic Scene Structure
Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure
Paul Henderson
Christoph H. Lampert
Bernd Bickel
VGen
123
7
0
16 Jun 2021
$C^3$: Compositional Counterfactual Contrastive Learning for
  Video-grounded Dialogues
C3C^3C3: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
67
2
0
16 Jun 2021
JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group
  and Activity Detection
JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection
Mahsa Ehsanpour
F. Saleh
Silvio Savarese
Ian Reid
Hamid Rezatofighi
81
44
0
16 Jun 2021
Previous
123...454647...717273
Next