ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07750
  4. Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
v1v2v3 (latest)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017
João Carreira
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 3,647 papers shown
Title
Spatial-temporal Concept based Explanation of 3D ConvNets
Spatial-temporal Concept based Explanation of 3D ConvNets
Yi Ji
Yu Wang
K. Mori
Jien Kato
3DPCFAtt
92
7
0
09 Jun 2022
Language-Bridged Spatial-Temporal Interaction for Referring Video Object
  Segmentation
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
Zihan Ding
Tianrui Hui
Junshi Huang
Xiaoming Wei
Jizhong Han
Si Liu
VOS
73
55
0
08 Jun 2022
Generating Long Videos of Dynamic Scenes
Generating Long Videos of Dynamic Scenes
Tim Brooks
Janne Hellsten
M. Aittala
Ting-Chun Wang
Timo Aila
J. Lehtinen
Xuan Li
Alexei A. Efros
Tero Karras
SyDa
104
114
0
07 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
96
115
0
07 Jun 2022
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal
  Action Detector
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector
Lin Sui
Chen-Da Liu-Zhang
Lixin Gu
Feng Han
143
8
0
07 Jun 2022
TadML: A fast temporal action detection with Mechanics-MLP
TadML: A fast temporal action detection with Mechanics-MLP
Bowen Deng
Dongchang Liu
83
1
0
07 Jun 2022
A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying
  Static vs. Dynamic Information
A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information
M. Kowal
Mennatullah Siam
Md. Amirul Islam
Neil D. B. Bruce
Richard P. Wildes
Konstantinos G. Derpanis
70
26
0
06 Jun 2022
3D Convolutional with Attention for Action Recognition
3D Convolutional with Attention for Action Recognition
Labina Shrestha
Shikha Dubey
Farrukh Olimov
M. Rafique
M. Jeon
38
0
0
05 Jun 2022
Towards Fast Adaptation of Pretrained Contrastive Models for
  Multi-channel Video-Language Retrieval
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Xudong Lin
Simran Tiwari
Shiyuan Huang
Manling Li
Mike Zheng Shou
Heng Ji
Shih-Fu Chang
138
21
0
05 Jun 2022
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation
Mingjie Li
Wenjia Cai
Karin Verspoor
Shirui Pan
Xiaodan Liang
Xiaojun Chang
MedIm
88
38
0
04 Jun 2022
Revisiting the "Video" in Video-Language Understanding
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
102
166
0
03 Jun 2022
Egocentric Video-Language Pretraining
Egocentric Video-Language Pretraining
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Guohao Li
Wei Liu
Mike Zheng Shou
VLMEgoV
104
207
0
03 Jun 2022
Anomaly detection in surveillance videos using transformer based
  attention model
Anomaly detection in surveillance videos using transformer based attention model
Kapil Deshpande
Narinder Singh Punn
S. K. Sonbhadra
Sonali Agarwal
ViTAI4TS
74
12
0
03 Jun 2022
Transforming medical imaging with Transformers? A comparative review of
  key properties, current progresses, and future perspectives
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViTOODMedIm
181
46
0
02 Jun 2022
A temporal chrominance trigger for clean-label backdoor attack against
  anti-spoof rebroadcast detection
A temporal chrominance trigger for clean-label backdoor attack against anti-spoof rebroadcast detection
Wei Guo
B. Tondi
Mauro Barni
AAML
66
13
0
02 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and
  Applications
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
95
57
0
02 Jun 2022
Cascaded Video Generation for Videos In-the-Wild
Cascaded Video Generation for Videos In-the-Wild
Lluis Castrejon
Nicolas Ballas
Aaron Courville
VGen
86
0
0
01 Jun 2022
Dual-stream spatiotemporal networks with feature sharing for monitoring
  animals in the home cage
Dual-stream spatiotemporal networks with feature sharing for monitoring animals in the home cage
Ezechukwu I. Nwokedi
R. Bains
L. Bidaut
Xujiong Ye
Sara Wells
James M. Brown
79
2
0
01 Jun 2022
From Representation to Reasoning: Towards both Evidence and Commonsense
  Reasoning for Video Question-Answering
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering
Jiangtong Li
Li Niu
Liqing Zhang
67
53
0
30 May 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via
  Transformers
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
389
633
0
29 May 2022
Micro-Expression Recognition Based on Attribute Information Embedding
  and Cross-modal Contrastive Learning
Micro-Expression Recognition Based on Attribute Information Embedding and Cross-modal Contrastive Learning
Yanxing Song
Jianzong Wang
Tianbo Wu
Zhangcheng Huang
Jing Xiao
CVBM
131
2
0
29 May 2022
Future Transformer for Long-term Action Anticipation
Future Transformer for Long-term Action Anticipation
Dayoung Gong
Joonseok Lee
Manjin Kim
S. Ha
Minsu Cho
AI4TS
53
66
0
27 May 2022
PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences
PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences
Hehe Fan
Xin Yu
Yuhang Ding
Yi Yang
Mohan Kankanhalli
3DPC
190
113
0
27 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual
  Recognition
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
259
706
0
26 May 2022
Do we really need temporal convolutions in action segmentation?
Do we really need temporal convolutions in action segmentation?
Dazhao Du
Fuchun Sun
Yu Li
Zhongang Qi
Hui Xiong
Ying Shan
ViT
72
17
0
26 May 2022
Learning What and Where: Disentangling Location and Identity Tracking
  Without Supervision
Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Manuel Traub
S. Otte
Tobias Menge
Matthias Karlbauer
Jannik Thummel
Martin Volker Butz
115
20
0
26 May 2022
VIDI: A Video Dataset of Incidents
VIDI: A Video Dataset of Incidents
Duygu Sesver
Alp Eren Gençoglu
Ç. Yildiz
Zehra Günindi
Faeze Habibi
Z. A. Yazici
H. K. Ekenel
68
4
0
26 May 2022
You Need to Read Again: Multi-granularity Perception Network for Moment
  Retrieval in Videos
You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos
Xin Sun
Xinyu Wang
Jialin Gao
Qiong Liu
Xiaoping Zhou
96
34
0
25 May 2022
Detection of Fights in Videos: A Comparison Study of Anomaly Detection
  and Action Recognition
Detection of Fights in Videos: A Comparison Study of Anomaly Detection and Action Recognition
Weijun Tan
Jingfeng Liu
71
8
0
23 May 2022
Deep Learning for Visual Speech Analysis: A Survey
Deep Learning for Visual Speech Analysis: A Survey
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Y. Guo
Xin Xu
M. Pietikäinen
Li Liu
VLM
98
36
0
22 May 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
GL-RG: Global-Local Representation Granularity for Video Captioning
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
125
59
0
22 May 2022
Structured Attention Composition for Temporal Action Localization
Structured Attention Composition for Temporal Action Localization
Le Yang
Junwei Han
Tao Zhao
Nian Liu
Dingwen Zhang
84
17
0
20 May 2022
Cross-Enhancement Transformer for Action Segmentation
Cross-Enhancement Transformer for Action Segmentation
Jiahui Wang
Zhenyou Wang
Shanna Zhuang
Hui Wang
ViT
97
23
0
19 May 2022
PYSKL: Towards Good Practices for Skeleton Action Recognition
PYSKL: Towards Good Practices for Skeleton Action Recognition
Haodong Duan
Jiaqi Wang
Kai-xiang Chen
Dahua Lin
VLM
88
147
0
19 May 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
202
62
0
17 May 2022
Learnable Optimal Sequential Grouping for Video Scene Detection
Learnable Optimal Sequential Grouping for Video Scene Detection
Daniel Rotman
Yevgeny Yaroker
Elad Amrani
Udi Barzelay
Rami Ben-Ari
35
10
0
17 May 2022
ETAD: Training Action Detection End to End on a Laptop
ETAD: Training Action Detection End to End on a Laptop
Shuming Liu
Mengmeng Xu
Chen Zhao
Xu Zhao
Guohao Li
78
7
0
14 May 2022
Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in
  the Wild
Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild
Fuyan Ma
Bin Sun
Shutao Li
ViT
57
31
0
10 May 2022
Scaling up sign spotting through sign language dictionaries
Scaling up sign spotting through sign language dictionaries
Gül Varol
Liliane Momeni
Samuel Albanie
Triantafyllos Afouras
Andrew Zisserman
71
15
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
94
128
0
08 May 2022
Deep Quality Assessment of Compressed Videos: A Subjective and Objective
  Study
Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study
Liqun Lin
Zheng Wang
Jiachen He
Weiling Chen
Yiwen Xu
Tiesong Zhao
84
7
0
07 May 2022
Representation Learning for Compressed Video Action Recognition via
  Attentive Cross-modal Interaction with Motion Enhancement
Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement
Bing Li
Jiaxin Chen
Dongming Zhang
Xiuguo Bao
Di Huang
56
15
0
07 May 2022
An Empirical Study on Activity Recognition in Long Surgical Videos
An Empirical Study on Activity Recognition in Long Surgical Videos
Zhuohong He
A. Mottaghi
Aidean Sharghi
Muhammad Abdullah Jamal
Omid Mohareri
90
12
0
05 May 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Mingdong Yang
Guo Chen
Yin-Dong Zheng
Tong Lu
Limin Wang
100
48
0
05 May 2022
Deep Neural Network approaches for Analysing Videos of Music
  Performances
Deep Neural Network approaches for Analysing Videos of Music Performances
F. Liwicki
Richa Upadhyay
Prakash Chandra Chhipa
Killian Murphy
F. Visi
S. Östersjö
Marcus Liwicki
64
1
0
05 May 2022
ANUBIS: Skeleton Action Recognition Dataset, Review, and Benchmark
ANUBIS: Skeleton Action Recognition Dataset, Review, and Benchmark
Zhenyue Qin
Yang Liu
Madhawa Perera
Tom Gedeon
Pan Ji
Dongwoo Kim
Saeed Anwar
60
4
0
04 May 2022
TransRank: Self-supervised Video Representation Learning via
  Ranking-based Transformation Recognition
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
Haodong Duan
Nanxuan Zhao
Kai-xiang Chen
Dahua Lin
ViTAI4TS
82
19
0
04 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
In Defense of Image Pre-Training for Spatiotemporal Recognition
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
74
0
0
03 May 2022
Cross-modal Representation Learning for Zero-shot Action Recognition
Cross-modal Representation Learning for Zero-shot Action Recognition
Chung-Ching Lin
Kevin Qinghong Lin
Linjie Li
Lijuan Wang
Zicheng Liu
ViT
62
29
0
03 May 2022
Convex Combination Consistency between Neighbors for Weakly-supervised
  Action Localization
Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Qinying Liu
Zilei Wang
Ruoxi Chen
Zhilin Li
74
4
0
01 May 2022
Previous
123...343536...717273
Next