ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07750
  4. Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
v1v2v3 (latest)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017
João Carreira
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 3,647 papers shown
Title
Shaping embodied agent behavior with activity-context priors from
  egocentric video
Shaping embodied agent behavior with activity-context priors from egocentric video
Tushar Nagarajan
Kristen Grauman
EgoVLM&Ro
134
15
0
14 Oct 2021
Nuisance-Label Supervision: Robustness Improvement by Free Labels
Nuisance-Label Supervision: Robustness Improvement by Free Labels
Xinyue Wei
Weichao Qiu
Yi Zhang
Zihao Xiao
Alan Yuille
75
0
0
14 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
437
1,115
0
13 Oct 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
103
84
0
13 Oct 2021
Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual
  Transformers with Joint Student-Teacher Learning
Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Ankit Parag Shah
Shijie Geng
Peng Gao
A. Cherian
Takaaki Hori
Tim K. Marks
Jonathan Le Roux
Chiori Hori
68
24
0
13 Oct 2021
Benchmarking the Robustness of Spatial-Temporal Models Against
  Corruptions
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions
Chenyu Yi
Siyuan Yang
Haoliang Li
Yap-Peng Tan
Alex C. Kot
97
33
0
13 Oct 2021
TAda! Temporally-Adaptive Convolutions for Video Understanding
TAda! Temporally-Adaptive Convolutions for Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Mingqian Tang
Ziwei Liu
M. Ang
144
49
0
12 Oct 2021
Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble
Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble
Songyao Jiang
Bin Sun
Lichen Wang
Yue Bai
Kunpeng Li
Y. Fu
SLR
106
38
0
12 Oct 2021
Multi-Modal Interaction Graph Convolutional Network for Temporal
  Language Localization in Videos
Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos
Zongmeng Zhang
Xianjing Han
Xuemeng Song
Yan Yan
Liqiang Nie
120
37
0
12 Oct 2021
Rethinking Supervised Pre-training for Better Downstream Transferring
Rethinking Supervised Pre-training for Better Downstream Transferring
Yutong Feng
Jianwen Jiang
Mingqian Tang
Rong Jin
Yue Gao
SSL
147
41
0
12 Oct 2021
Video Is Graph: Structured Graph Module for Video Action Recognition
Video Is Graph: Structured Graph Module for Video Action Recognition
Rongjie Li
Xiaojun Wu
Tianyang Xu
95
12
0
12 Oct 2021
Joint Learning On The Hierarchy Representation for Fine-Grained Human
  Action Recognition
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition
M. C. Leong
Hui Li Tan
Haosong Zhang
Liyuan Li
Feng Lin
J. Lim
70
10
0
12 Oct 2021
Relation-aware Video Reading Comprehension for Temporal Language
  Grounding
Relation-aware Video Reading Comprehension for Temporal Language Grounding
Jialin Gao
Xin Sun
Mengmeng Xu
Xi Zhou
Guohao Li
96
48
0
12 Oct 2021
Hierarchical Modeling for Task Recognition and Action Segmentation in
  Weakly-Labeled Instructional Videos
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
Reza Ghoddoosian
S. Sayed
V. Athitsos
76
15
0
12 Oct 2021
Towards Streaming Egocentric Action Anticipation
Towards Streaming Egocentric Action Anticipation
Antonino Furnari
G. Farinella
EgoV
73
6
0
11 Oct 2021
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign
  Language Recognition
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
Hezhen Hu
Weichao Zhao
Wen-gang Zhou
Yuechen Wang
Houqiang Li
ViT
79
71
0
11 Oct 2021
High-order Tensor Pooling with Attention for Action Recognition
High-order Tensor Pooling with Attention for Action Recognition
Lei Wang
Ke Sun
Piotr Koniusz
96
15
0
11 Oct 2021
Predicting decision-making in the future: Human versus Machine
Predicting decision-making in the future: Human versus Machine
H. Ryu
Uijong Ju
C. Wallraven
3DH
67
0
0
09 Oct 2021
Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power
  of Geometric Transformations
Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations
Shasha Li
Abhishek Aich
Shitong Zhu
M. Salman Asif
Chengyu Song
Amit K. Roy-Chowdhury
S. Krishnamurthy
AAML
196
39
0
05 Oct 2021
Procedure Planning in Instructional Videos via Contextual Modeling and
  Model-based Policy Learning
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi
Jiebo Luo
Chenliang Xu
128
49
0
05 Oct 2021
Spatio-Temporal Video Representation Learning for AI Based Video
  Playback Style Prediction
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction
Rishubh Parihar
Gaurav Ramola
Ranajit Saha
Raviprasad Kini
Aniket Rege
S. Velusamy
72
1
0
03 Oct 2021
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani
Yingli Tian
156
65
0
30 Sep 2021
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned
  Meta-Adaptation
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation
Jay Patravali
Gaurav Mittal
Ye Yu
Fuxin Li
Mei Chen
94
19
0
30 Sep 2021
The Challenge of Appearance-Free Object Tracking with Feedforward Neural
  Networks
The Challenge of Appearance-Free Object Tracking with Feedforward Neural Networks
Girik Malik
Drew Linsley
Thomas Serre
E. Mingolla
VOT
72
7
0
30 Sep 2021
Motion-aware Contrastive Video Representation Learning via
  Foreground-background Merging
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Shuangrui Ding
Maomao Li
Tianyu Yang
Rui Qian
Haohang Xu
Qingyi Chen
Jue Wang
Hongkai Xiong
SSL
96
51
0
30 Sep 2021
Comparative Validation of Machine Learning Algorithms for Surgical
  Workflow and Skill Analysis with the HeiChole Benchmark
Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark
M. Wagner
Beat-Peter Müller-Stich
A. Kisilenko
Duc Tran
P. Heger
...
M. Frankenberg
F. Mathis-Ullrich
Lena Maier-Hein
Stefanie Speidel
S. Bodenstedt
94
78
0
30 Sep 2021
Three-Stream 3D/1D CNN for Fine-Grained Action Classification and
  Segmentation in Table Tennis
Three-Stream 3D/1D CNN for Fine-Grained Action Classification and Segmentation in Table Tennis
Pierre-Etienne Martin
J. Benois-Pineau
Renaud Péteri
J. Morlier
MedIm
83
14
0
29 Sep 2021
Information Elevation Network for Fast Online Action Detection
Information Elevation Network for Fast Online Action Detection
Sunah Min
Jinyoung Moon
34
0
0
28 Sep 2021
Physical Context and Timing Aware Sequence Generating GANs
Physical Context and Timing Aware Sequence Generating GANs
Hayato Futase
Tomoki Tsujimura
Tetsuya Kajimoto
Hajime Kawarazaki
Toshiyuki Suzuki
Makoto Miwa
Yutaka Sasaki
GAN
146
0
0
28 Sep 2021
Modelling Neighbor Relation in Joint Space-Time Graph for Video
  Correspondence Learning
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning
Zixu Zhao
Yueming Jin
Pheng-Ann Heng
SSL
82
21
0
28 Sep 2021
TSM: Temporal Shift Module for Efficient and Scalable Video
  Understanding on Edge Device
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device
Ji Lin
Chuang Gan
Kuan-Chieh Wang
Song Han
108
65
0
27 Sep 2021
Joint Multimedia Event Extraction from Video and Article
Joint Multimedia Event Extraction from Video and Article
Brian Chen
Xudong Lin
Christopher Thomas
Manling Li
Shoya Yoshida
Lovish Chum
Heng Ji
Shih-Fu Chang
VGen
83
26
0
27 Sep 2021
Multi-Modal Multi-Instance Learning for Retinal Disease Recognition
Multi-Modal Multi-Instance Learning for Retinal Disease Recognition
Xirong Li
Yang Zhou
Jie Wang
Hailan Lin
Jianchun Zhao
Dayong Ding
Weihong Yu
You-xin Chen
120
38
0
25 Sep 2021
DeepStroke: An Efficient Stroke Screening Framework for Emergency Rooms
  with Multimodal Adversarial Deep Learning
DeepStroke: An Efficient Stroke Screening Framework for Emergency Rooms with Multimodal Adversarial Deep Learning
Tongan Cai
Haomiao Ni
Ming-Chieh Yu
Xiaolei Huang
K. Wong
John Volpi
Jianmin Wang
Stephen T. C. Wong
70
16
0
24 Sep 2021
Long Short View Feature Decomposition via Contrastive Video
  Representation Learning
Long Short View Feature Decomposition via Contrastive Video Representation Learning
Nadine Behrmann
Mohsen Fayyaz
Juergen Gall
M. Noroozi
66
36
0
23 Sep 2021
Natural Language Video Localization with Learnable Moment Proposals
Natural Language Video Localization with Learnable Moment Proposals
Shaoning Xiao
Long Chen
Jian Shao
Yueting Zhuang
Jun Xiao
87
43
0
22 Sep 2021
Unsupervised Abstract Reasoning for Raven's Problem Matrices
Unsupervised Abstract Reasoning for Raven's Problem Matrices
Tao Zhuo
Qian Huang
Mohan S. Kankanhalli
LRM
177
23
0
21 Sep 2021
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic
  Interactions
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions
D. Curto
Albert Clapés
Javier Selva
Sorina Smeureanu
Julio C. S. Jacques Junior
...
G. Guilera
D. Leiva
T. Moeslund
Sergio Escalera
Cristina Palmero
75
30
0
20 Sep 2021
V-SlowFast Network for Efficient Visual Sound Separation
V-SlowFast Network for Efficient Visual Sound Separation
Lingyu Zhu
Esa Rahtu
116
10
0
18 Sep 2021
A survey on deep learning approaches for breast cancer diagnosis
A survey on deep learning approaches for breast cancer diagnosis
Timothy C. H. Kwong
S. Mazaheri
MedIm
68
4
0
18 Sep 2021
Towards High-Quality Temporal Action Detection with Sparse Proposals
Towards High-Quality Temporal Action Detection with Sparse Proposals
Jiannan Wu
Pei Sun
Shoufa Chen
Jiewen Yang
Zihao Qi
Lan Ma
Ping Luo
ViT
73
10
0
18 Sep 2021
Unsupervised View-Invariant Human Posture Representation
Unsupervised View-Invariant Human Posture Representation
Faegheh Sardari
Bjorn Ommer
Majid Mirmehdi
3DH
71
3
0
17 Sep 2021
Asymmetric 3D Context Fusion for Universal Lesion Detection
Asymmetric 3D Context Fusion for Universal Lesion Detection
Jiancheng Yang
Yi He
Kaiming Kuang
Zudi Lin
Hanspeter Pfister
Bingbing Ni
3DPCMedIm
95
23
0
17 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
224
373
0
17 Sep 2021
Overview of Tencent Multi-modal Ads Video Understanding Challenge
Overview of Tencent Multi-modal Ads Video Understanding Challenge
Zhenzhi Wang
Liyu Wu
Zhimin Li
Jiangfeng Xiong
Qinglin Lu
58
4
0
16 Sep 2021
Progressively Guide to Attend: An Iterative Alignment Framework for
  Temporal Sentence Grounding
Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Pan Zhou
92
46
0
14 Sep 2021
Adaptive Proposal Generation Network for Temporal Sentence Localization
  in Videos
Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos
Daizong Liu
Xiaoye Qu
Jianfeng Dong
Pan Zhou
92
55
0
14 Sep 2021
Negative Sample Matters: A Renaissance of Metric Learning for Temporal
  Grounding
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
Zhenzhi Wang
Limin Wang
Tao Wu
Tianhao Li
Gangshan Wu
AI4TS
112
122
0
10 Sep 2021
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
Jiankai Sun
De-An Huang
Bo Lu
Yunhui Liu
Bolei Zhou
Animesh Garg
76
56
0
10 Sep 2021
Learning to Combine the Modalities of Language and Video for Temporal
  Moment Localization
Learning to Combine the Modalities of Language and Video for Temporal Moment Localization
Jungkyoo Shin
Jinyoung Moon
64
8
0
07 Sep 2021
Previous
123...424344...717273
Next