ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07750
  4. Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
v1v2v3 (latest)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017
João Carreira
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 3,647 papers shown
Title
An Empirical Study of End-to-End Temporal Action Detection
An Empirical Study of End-to-End Temporal Action Detection
Xiaolong Liu
S. Bai
Xiang Bai
96
60
0
06 Apr 2022
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zerui Li
Cheng Lu
Jia Qin
Chunle Guo
Mingg-Ming Cheng
110
153
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video
  Segmentation
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
99
22
0
06 Apr 2022
Dual-AI: Dual-path Actor Interaction Learning for Group Activity
  Recognition
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
Mingfei Han
David Junhao Zhang
Yali Wang
Rui Yan
L. Yao
Xiaojun Chang
Yu Qiao
72
56
0
05 Apr 2022
Detector-Free Weakly Supervised Group Activity Recognition
Detector-Free Weakly Supervised Group Activity Recognition
Dongkeun Kim
Jin S. Lee
Minsu Cho
Suha Kwak
ViT
77
44
0
05 Apr 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric
  Videos
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos
Shao-Wei Liu
Subarna Tripathi
Somdeb Majumdar
Xiaolong Wang
EgoV
115
97
0
04 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
140
106
0
04 Apr 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
120
94
0
04 Apr 2022
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal
  Grounding
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding
Ziyue Wu
Junyu Gao
Shucheng Huang
Changsheng Xu
84
4
0
04 Apr 2022
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers
  for Repetitive Action Counting
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Huazhang Hu
Sixun Dong
Yiqun Zhao
Dongze Lian
Zhengxin Li
Shenghua Gao
89
52
0
03 Apr 2022
Quantized GAN for Complex Music Generation from Dance Videos
Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
118
46
0
01 Apr 2022
Vision Transformer with Cross-attention by Temporal Shift for Efficient
  Action Recognition
Vision Transformer with Cross-attention by Temporal Shift for Efficient Action Recognition
Ryota Hashiguchi
Toru Tamaki
47
6
0
01 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
117
25
0
01 Apr 2022
Fine-grained Temporal Contrastive Learning for Weakly-supervised
  Temporal Action Localization
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
Junyu Gao
Mengyuan Chen
Changsheng Xu
62
71
0
31 Mar 2022
Stochastic Backpropagation: A Memory Efficient Strategy for Training
  Video Models
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
Feng Cheng
Ming Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Li
Wei Xia
63
17
0
31 Mar 2022
Controllable Augmentations for Video Representation Learning
Controllable Augmentations for Video Representation Learning
Rui Qian
Weiyao Lin
John See
Dian Li
SSLAI4TS
54
10
0
30 Mar 2022
CycDA: Unsupervised Cycle Domain Adaptation from Image to Video
CycDA: Unsupervised Cycle Domain Adaptation from Image to Video
Wei Lin
Anna Kukleva
Kunyang Sun
Horst Possegger
Hilde Kuehne
Horst Bischof
VGen
135
7
0
30 Mar 2022
StyleFool: Fooling Video Classification Systems via Style Transfer
StyleFool: Fooling Video Classification Systems via Style Transfer
Yu Cao
Xi Xiao
Ruoxi Sun
Derui Wang
Minhui Xue
Sheng Wen
AAML
131
26
0
30 Mar 2022
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation
Guang Feng
Lihe Zhang
Zhiwei Hu
Huchuan Lu
VOS
116
4
0
30 Mar 2022
Alignment-Uniformity aware Representation Learning for Zero-shot Video
  Classification
Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification
Shi Pu
Kaili Zhao
Mao Zheng
VLM
76
20
0
29 Mar 2022
End-to-End Compressed Video Representation Learning for Generic Event
  Boundary Detection
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection
Congcong Li
Xinyao Wang
Longyin Wen
Dexiang Hong
Tiejian Luo
Libo Zhang
78
17
0
29 Mar 2022
SPAct: Self-supervised Privacy Preservation for Action Recognition
SPAct: Self-supervised Privacy Preservation for Action Recognition
I. Dave
Chong Chen
M. Shah
PICV
74
59
0
29 Mar 2022
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal
  Action Localization
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Bo He
Xitong Yang
Le Kang
Zhiyu Cheng
Xingfa Zhou
Abhinav Shrivastava
79
81
0
29 Mar 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
Anthony L. Caterini
Animesh Garg
Guangwei Yu
101
162
0
28 Mar 2022
Frame-wise Action Representations for Long Videos via Sequence
  Contrastive Learning
Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning
Minghao Chen
Fangyun Wei
Chong Li
Deng Cai
AI4TS
105
35
0
28 Mar 2022
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional
  Emotion Recognition
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
R Gnana Praveen
W. Melo
Nasib Ullah
Haseeb Aslam
Osama Zeeshan
...
M. Pedersoli
Alessandro Lameiras Koerich
Simon L Bacon
P. Cardinal
Eric Granger
122
71
0
28 Mar 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding
  Procedural Activities
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
100
221
0
28 Mar 2022
LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR
  Point Clouds
LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point Clouds
Jialian Li
Jingyi Zhang
Zhiyong Wang
Siqi Shen
Chenglu Wen
Yuexin Ma
Lan Xu
Jingyi Yu
Cheng-i Wang
3DPC
114
33
0
28 Mar 2022
Discovering Human-Object Interaction Concepts via Self-Compositional
  Learning
Discovering Human-Object Interaction Concepts via Self-Compositional Learning
Zhi Hou
Baosheng Yu
Dacheng Tao
92
19
0
27 Mar 2022
Audio-Adaptive Activity Recognition Across Video Domains
Audio-Adaptive Activity Recognition Across Video Domains
Yun C. Zhang
Hazel Doughty
Ling Shao
Cees G. M. Snoek
75
42
0
27 Mar 2022
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional
  Videos
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
Muheng Li
Lei Chen
Yueqi Duan
Zhilan Hu
Jianjiang Feng
Jie Zhou
Jiwen Lu
79
76
0
26 Mar 2022
Class-Incremental Learning for Action Recognition in Videos
Class-Incremental Learning for Action Recognition in Videos
Jaeyoo Park
Minsoo Kang
Bohyung Han
CLL
84
52
0
25 Mar 2022
Learning to Adapt to Unseen Abnormal Activities under Weak Supervision
Learning to Adapt to Unseen Abnormal Activities under Weak Supervision
Jaeyoo Park
Junha Kim
Bohyung Han
OffRL
67
5
0
25 Mar 2022
Weakly-Supervised Online Action Segmentation in Multi-View Instructional
  Videos
Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
Reza Ghoddoosian
Isht Dwivedi
Nakul Agarwal
Chiho Choi
Behzad Dariush
69
19
0
24 Mar 2022
Movie Genre Classification by Language Augmentation and Shot Sampling
Movie Genre Classification by Language Augmentation and Shot Sampling
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
Xin Miao
Jiayi Liu
Huayan Wang
VLMCLIP
70
1
0
24 Mar 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph
  Correspondence Learning
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Leilei Gan
Yi Yang
Yueting Zhuang
Xinze Wang
103
75
0
24 Mar 2022
Interpretable Prediction of Pulmonary Hypertension in Newborns using
  Echocardiograms
Interpretable Prediction of Pulmonary Hypertension in Newborns using Echocardiograms
H. Ragnarsdóttir
Laura Manduchi
H. Michel
F. Laumer
S. Wellmann
Ece Ozkan
Julia-Franziska Vogt
66
3
0
24 Mar 2022
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly
  Detection
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection
Hitesh Sapkota
Qi Yu
76
40
0
24 Mar 2022
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval
  and Highlight Detection
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Ye Liu
Siyuan Li
Yang Wu
C. Chen
Ying Shan
Xiaohu Qie
ViT
113
151
0
23 Mar 2022
The Challenges of Continuous Self-Supervised Learning
The Challenges of Continuous Self-Supervised Learning
Senthil Purushwalkam
Pedro Morgado
Abhinav Gupta
CLL
89
44
0
23 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
254
1,222
0
23 Mar 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
137
19
0
23 Mar 2022
Contrastive Transformer-based Multiple Instance Learning for Weakly
  Supervised Polyp Frame Detection
Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection
Yu Tian
Guansong Pang
Fengbei Liu
Yuyuan Liu
Chong Wang
Yuanhong Chen
Johan Verjans
G. Carneiro
ViTMedIm
87
29
0
23 Mar 2022
Enabling faster and more reliable sonographic assessment of gestational
  age through machine learning
Enabling faster and more reliable sonographic assessment of gestational age through machine learning
Chace Lee
Angelica Willis
Christina W. Chen
M. Sieniek
Akib A Uddin
...
Rory Pilgrim
Katherine Chou
Daniel Tse
S. Shetty
Ryan G. Gomes
47
0
0
22 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions
  from Untrimmed Web Videos
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
104
33
0
22 Mar 2022
Generative Adversarial Network for Future Hand Segmentation from
  Egocentric Video
Generative Adversarial Network for Future Hand Segmentation from Egocentric Video
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
79
14
0
21 Mar 2022
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static
  Models by Fitting Feature-level Space-time Surfaces
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces
Jianqi Zhong
Kaichen Zhou
Qingyong Hu
Bing Wang
Niki Trigoni
Andrew Markham
3DPC
101
23
0
21 Mar 2022
Facial Expression Analysis Using Decomposed Multiscale Spatiotemporal
  Networks
Facial Expression Analysis Using Decomposed Multiscale Spatiotemporal Networks
W. Melo
Eric Granger
Miguel Bordallo López
CVBM
86
22
0
21 Mar 2022
LocATe: End-to-end Localization of Actions in 3D with Transformers
LocATe: End-to-end Localization of Actions in 3D with Transformers
Jiankai Sun
Bolei Zhou
Michael J. Black
Arjun Chandrasekaran
143
8
0
21 Mar 2022
FAR: Fourier Aerial Video Recognition
FAR: Fourier Aerial Video Recognition
D. Kothandaraman
Tianrui Guan
Xijun Wang
Sean Hu
Ming-Shun Lin
Tianyi Zhou
80
13
0
21 Mar 2022
Previous
123...363738...717273
Next