ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07750
  4. Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017
João Carreira
Andrew Zisserman
ArXivPDFHTML

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 1,508 papers shown
Title
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
Mahdi M. Kalayeh
Shervin Ardeshir
Lingyi Liu
Nagendra Kamath
Ashok Chandrashekar
SSL
35
3
0
29 Apr 2022
Tragedy Plus Time: Capturing Unintended Human Activities from
  Weakly-labeled Videos
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
40
2
0
28 Apr 2022
The Wisdom of Crowds: Temporal Progressive Attention for Early Action
  Prediction
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
Alexandros Stergiou
Dima Damen
AI4TS
EgoV
EDL
26
7
0
28 Apr 2022
Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation
  Learning for Action Recognition Pre-Training
Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training
Guanhong Wang
Ke Lu
Yang Zhou
Zhanhao He
Gaoang Wang
SSL
32
3
0
27 Apr 2022
Contrastive Language-Action Pre-training for Temporal Localization
Contrastive Language-Action Pre-training for Temporal Localization
Mengmeng Xu
Erhan Gundogdu
⋆⋆ Maksim
Guohao Li
M. Donoser
Loris Bazzani
38
27
0
26 Apr 2022
ClothFormer:Taming Video Virtual Try-on in All Module
ClothFormer:Taming Video Virtual Try-on in All Module
Jianbin Jiang
Tan Wang
He Yan
Junhui Liu
40
25
0
26 Apr 2022
Temporal Relevance Analysis for Video Action Models
Temporal Relevance Analysis for Video Action Models
Quanfu Fan
Donghyun Kim
Chun-Fu Chen
Chen
Stan Sclaroff
Kate Saenko
Sarah Adel Bargal
FAtt
33
0
0
25 Apr 2022
iCAR: Bridging Image Classification and Image-text Alignment for Visual
  Recognition
iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition
Yixuan Wei
Yue Cao
Zheng Zhang
Zhuliang Yao
Zhenda Xie
Han Hu
B. Guo
VLM
29
11
0
22 Apr 2022
Video Moment Retrieval from Text Queries via Single Frame Annotation
Video Moment Retrieval from Text Queries via Single Frame Annotation
Ran Cui
Tianwen Qian
Pai Peng
E. Daskalaki
Jingjing Chen
Xiao-Wei Guo
Huyang Sun
Yu-Gang Jiang
22
35
0
20 Apr 2022
Attention in Attention: Modeling Context Correlation for Efficient Video
  Classification
Attention in Attention: Modeling Context Correlation for Efficient Video Classification
Y. Hao
Shuo Wang
P. Cao
Xinjian Gao
Tong Xu
Jinmeng Wu
Xiangnan He
39
41
0
20 Apr 2022
Sound-Guided Semantic Video Generation
Sound-Guided Semantic Video Generation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Chanyoung Kim
Wonjae Ryoo
Sang Ho Yoon
Hyunjun Cho
Jihyun Bae
Jinkyu Kim
Sangpil Kim
VGen
38
26
0
20 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
35
0
0
17 Apr 2022
3D Convolutional Networks for Action Recognition: Application to Sport
  Gesture Recognition
3D Convolutional Networks for Action Recognition: Application to Sport Gesture Recognition
Pierre-Etienne Martin
J. Benois-Pineau
Renaud Péteri
A. Zemmari
J. Morlier
32
5
0
13 Apr 2022
Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset
  and Multimodal Method for Temporal Forgery Localization
Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization
Zhixi Cai
Kalin Stefanov
Abhinav Dhall
Munawar Hayat
27
3
0
13 Apr 2022
Calibrating Class Weights with Multi-Modal Information for Partial Video
  Domain Adaptation
Calibrating Class Weights with Multi-Modal Information for Partial Video Domain Adaptation
Xiyu Wang
Yuecong Xu
K. Mao
Jianfei Yang
26
8
0
13 Apr 2022
Position-aware Location Regression Network for Temporal Video Grounding
Position-aware Location Regression Network for Temporal Video Grounding
Sunoh Kim
Kimin Yun
J. Choi
27
4
0
12 Apr 2022
CholecTriplet2021: A benchmark challenge for surgical action triplet
  recognition
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition
C. Nwoye
Deepak Alapatt
Tong Yu
Armine Vardazaryan
Fangfang Xia
...
Didier Mutter
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
N. Padoy
25
50
0
10 Apr 2022
Self-Supervised Video Representation Learning with Motion-Contrastive
  Perception
Self-Supervised Video Representation Learning with Motion-Contrastive Perception
Jin-Yuan Liu
Ying Cheng
Yuejie Zhang
Ruiwei Zhao
Rui Feng
SSL
26
1
0
10 Apr 2022
Multimodal Transformer for Nursing Activity Recognition
Multimodal Transformer for Nursing Activity Recognition
Momal Ijaz
Renato Diaz
Chong Chen
ViT
35
26
0
09 Apr 2022
Probabilistic Representations for Video Contrastive Learning
Probabilistic Representations for Video Contrastive Learning
Jungin Park
Jiyoung Lee
Ig-Jae Kim
Kwanghoon Sohn
SSL
40
44
0
08 Apr 2022
Frequency Selective Augmentation for Video Representation Learning
Frequency Selective Augmentation for Video Representation Learning
Jinhyung Kim
Taeoh Kim
Minho Shim
Dongyoon Han
Dongyoon Wee
Junmo Kim
AI4TS
54
3
0
08 Apr 2022
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
  Assessment
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Jinglin Xu
Yongming Rao
Xumin Yu
Guangyi Chen
Jie Zhou
Jiwen Lu
30
88
0
07 Apr 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive
  Transformer
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Songwei Ge
Thomas Hayes
Harry Yang
Xiaoyue Yin
Guan Pang
David Jacobs
Jia-Bin Huang
Devi Parikh
ViT
62
215
0
07 Apr 2022
Video Diffusion Models
Video Diffusion Models
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
101
1,533
0
07 Apr 2022
Continual Inference: A Library for Efficient Online Inference with Deep
  Neural Networks in PyTorch
Continual Inference: A Library for Efficient Online Inference with Deep Neural Networks in PyTorch
Lukas Hedegaard
Alexandros Iosifidis
BDL
3DV
CLL
17
6
0
07 Apr 2022
Detection of Distracted Driver using Convolution Neural Network
Detection of Distracted Driver using Convolution Neural Network
Narayana Darapaneni
Jai Arora
MoniShankar Hazra
Naman Vig
Simrandeep Singh Gandhi
Saurabh Gupta
A. Paduri
13
8
0
07 Apr 2022
Hierarchical Self-supervised Representation Learning for Movie
  Understanding
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
46
24
0
06 Apr 2022
Learning from Untrimmed Videos: Self-Supervised Video Representation
  Learning with Hierarchical Consistency
Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yi Tian Xu
Xiang Wang
Mingqian Tang
Changxin Gao
Rong Jin
Nong Sang
SSL
AI4TS
33
17
0
06 Apr 2022
Video Demoireing with Relation-Based Temporal Consistency
Video Demoireing with Relation-Based Temporal Consistency
Peng Dai
Xin Yu
Lan Ma
Baoheng Zhang
Jia Li
Wenbo Li
Jiajun Shen
Xiaojuan Qi
34
25
0
06 Apr 2022
An Empirical Study of End-to-End Temporal Action Detection
An Empirical Study of End-to-End Temporal Action Detection
Xiaolong Liu
S. Bai
Xiang Bai
27
58
0
06 Apr 2022
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zerui Li
Cheng Lu
Jia Qin
Chunle Guo
Mingg-Ming Cheng
60
149
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video
  Segmentation
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
29
21
0
06 Apr 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric
  Videos
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos
Shao-Wei Liu
Subarna Tripathi
Somdeb Majumdar
Xiaolong Wang
EgoV
45
93
0
04 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
56
102
0
04 Apr 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
37
92
0
04 Apr 2022
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers
  for Repetitive Action Counting
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Huazhang Hu
Sixun Dong
Yiqun Zhao
Dongze Lian
Zhengxin Li
Shenghua Gao
26
47
0
03 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
17
24
0
01 Apr 2022
Fine-grained Temporal Contrastive Learning for Weakly-supervised
  Temporal Action Localization
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
Junyu Gao
Mengyuan Chen
Changsheng Xu
20
66
0
31 Mar 2022
End-to-End Compressed Video Representation Learning for Generic Event
  Boundary Detection
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection
Congcong Li
Xinyao Wang
Longyin Wen
Dexiang Hong
Tiejian Luo
Libo Zhang
30
16
0
29 Mar 2022
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal
  Action Localization
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Bo He
Xitong Yang
Le Kang
Zhiyu Cheng
Xingfa Zhou
Abhinav Shrivastava
35
77
0
29 Mar 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
M. Volkovs
Animesh Garg
Guangwei Yu
44
153
0
28 Mar 2022
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional
  Emotion Recognition
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
R Gnana Praveen
W. Melo
Nasib Ullah
Haseeb Aslam
Osama Zeeshan
...
M. Pedersoli
Alessandro Lameiras Koerich
Simon L Bacon
P. Cardinal
Eric Granger
30
68
0
28 Mar 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding
  Procedural Activities
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
38
205
0
28 Mar 2022
Discovering Human-Object Interaction Concepts via Self-Compositional
  Learning
Discovering Human-Object Interaction Concepts via Self-Compositional Learning
Zhi Hou
Baosheng Yu
Dacheng Tao
27
18
0
27 Mar 2022
Class-Incremental Learning for Action Recognition in Videos
Class-Incremental Learning for Action Recognition in Videos
Jaeyoo Park
Minsoo Kang
Bohyung Han
CLL
24
52
0
25 Mar 2022
Learning to Adapt to Unseen Abnormal Activities under Weak Supervision
Learning to Adapt to Unseen Abnormal Activities under Weak Supervision
Jaeyoo Park
Junha Kim
Bohyung Han
OffRL
23
5
0
25 Mar 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph
  Correspondence Learning
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Fei Wu
Yi Yang
Yueting Zhuang
Xinze Wang
44
73
0
24 Mar 2022
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly
  Detection
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection
Hitesh Sapkota
Qi Yu
16
39
0
24 Mar 2022
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval
  and Highlight Detection
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Ye Liu
Siyuan Li
Yang Wu
C. Chen
Ying Shan
Xiaohu Qie
ViT
29
141
0
23 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
170
1,134
0
23 Mar 2022
Previous
123...121314...293031
Next