ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07750
  4. Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
v1v2v3 (latest)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017
João Carreira
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 3,647 papers shown
Title
Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion
  Network for Action Recognition
Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition
Xiaoguang Zhu
Ye Zhu
Haoyu Wang
Honglin Wen
Yan Yan
Peilin Liu
101
28
0
23 Feb 2022
Movies2Scenes: Using Movie Metadata to Learn Scene Representation
Movies2Scenes: Using Movie Metadata to Learn Scene Representation
Shixing Chen
Chundi Liu
Xiang Hao
Xiaohan Nie
Maxim Arap
Raffay Hamid
69
17
0
22 Feb 2022
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video
  Representations
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations
Yoshihiro Yamazaki
Shota Orihashi
Ryo Masumura
Mihiro Uchida
Akihiko Takashima
45
8
0
21 Feb 2022
Going Deeper into Recognizing Actions in Dark Environments: A
  Comprehensive Benchmark Study
Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study
Yuecong Xu
Jianfei Yang
Haozhi Cao
Jianxiong Yin
Zhenghua Chen
Xiaoli Li
Zhengguo Li
Qiaoqiao Xu
83
2
0
19 Feb 2022
Predicting emotion from music videos: exploring the relative
  contribution of visual and auditory information to affective responses
Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses
Phoebe K. Chua
D. Makris
Dorien Herremans
Gemma Roig
Design
87
9
0
19 Feb 2022
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
A. Cherian
Chiori Hori
Tim K. Marks
Jonathan Le Roux
113
38
0
18 Feb 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
186
228
0
18 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated
  Actions in Vlogs
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
59
3
0
16 Feb 2022
ActionFormer: Localizing Moments of Actions with Transformers
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
129
342
0
16 Feb 2022
HAKE: A Knowledge Engine Foundation for Human Activity Understanding
HAKE: A Knowledge Engine Foundation for Human Activity Understanding
Yong-Lu Li
Xinpeng Liu
Xiaoqian Wu
Yizhuo Li
Zuoyu Qiu
Liang Xu
Yue Xu
Haoshu Fang
Cewu Lu
99
38
0
14 Feb 2022
Adaptive Graph Convolutional Networks for Weakly Supervised Anomaly
  Detection in Videos
Adaptive Graph Convolutional Networks for Weakly Supervised Anomaly Detection in Videos
Congqi Cao
Xin Zhang
Shizhou Zhang
Peng Wang
Yanning Zhang
AI4TS
107
23
0
14 Feb 2022
Robust Deepfake On Unrestricted Media: Generation And Detection
Robust Deepfake On Unrestricted Media: Generation And Detection
Trung-Nghia Le
H. Nguyen
Junichi Yamagishi
Isao Echizen
97
8
0
13 Feb 2022
Learning Temporal Rules from Noisy Timeseries Data
Learning Temporal Rules from Noisy Timeseries Data
Karan Samel
Zelin Zhao
Binghong Chen
Shuang Li
D. Subramanian
Irfan Essa
Le Song
NoLaNAIAI4TSAI4CE
24
2
0
11 Feb 2022
Characterizing and overcoming the greedy nature of learning in
  multi-modal deep neural networks
Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks
Nan Wu
Stanislaw Jastrzebski
Kyunghyun Cho
Krzysztof J. Geras
77
76
0
10 Feb 2022
Joint-bone Fusion Graph Convolutional Network for Semi-supervised
  Skeleton Action Recognition
Joint-bone Fusion Graph Convolutional Network for Semi-supervised Skeleton Action Recognition
Zhigang Tu
Jiaxu Zhang
Hongyan Li
Yujin Chen
Junsong Yuan
86
86
0
08 Feb 2022
A Coding Framework and Benchmark towards Compressed Video Understanding
A Coding Framework and Benchmark towards Compressed Video Understanding
Yuan Tian
Guo Lu
Yichao Yan
Guangtao Zhai
Lixing Chen
Zhiyong Gao
83
25
0
06 Feb 2022
Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action
  Recognition
Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action Recognition
Lipeng Ke
Kuan-Chuan Peng
Siwei Lyu
3DPC
67
34
0
04 Feb 2022
Video Violence Recognition and Localization Using a Semi-Supervised Hard
  Attention Model
Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention Model
Hamid Reza Mohammadi
Ehsan Nazerfard
106
25
0
04 Feb 2022
MMSys'22 Grand Challenge on AI-based Video Production for Soccer
MMSys'22 Grand Challenge on AI-based Video Production for Soccer
Cise Midoglu
Steven A. Hicks
Vajira Thambawita
T. Kupka
Pål Halvorsen
VGen
92
14
0
02 Feb 2022
Should I take a walk? Estimating Energy Expenditure from Video Data
Should I take a walk? Estimating Energy Expenditure from Video Data
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
75
4
0
01 Feb 2022
A Dataset for Medical Instructional Video Classification and Question
  Answering
A Dataset for Medical Instructional Video Classification and Question Answering
D. Gupta
Kush Attal
Dina Demner-Fushman
114
33
0
30 Jan 2022
TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of
  Medical Images
TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images
Jiangyun Li
Wenxuan Wang
Chen Chen
Tianxiang Zhang
Sen Zha
Jing Wang
Hong Yu
ViTMedIm
130
25
0
30 Jan 2022
Assessing Cross-dataset Generalization of Pedestrian Crossing Predictors
Assessing Cross-dataset Generalization of Pedestrian Crossing Predictors
Joseph Gesnouin
Steve Pechberti
B. Stanciulescu
Fabien Moutarde
67
13
0
29 Jan 2022
Learning To Recognize Procedural Activities with Distant Supervision
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
111
87
0
26 Jan 2022
Capturing Temporal Information in a Single Frame: Channel Sampling
  Strategies for Action Recognition
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition
Kiyoon Kim
Shreyank N. Gowda
Oisin Mac Aodha
Laura Sevilla-Lara
111
10
0
25 Jan 2022
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
250
384
0
24 Jan 2022
vCLIMB: A Novel Video Class Incremental Learning Benchmark
vCLIMB: A Novel Video Class Incremental Learning Benchmark
Andrés Villa
Kumail Alhamoud
Juan Carlos León Alcázar
Fabian Caba Heilbron
Victor Escorcia
Guohao Li
CLL
165
33
0
23 Jan 2022
Reading-strategy Inspired Visual Representation Learning for
  Text-to-Video Retrieval
Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval
Jianfeng Dong
Yabing Wang
Xianke Chen
Xiaoye Qu
Xirong Li
Y. He
Xun Wang
87
59
0
23 Jan 2022
LTC-GIF: Attracting More Clicks on Feature-length Sports Videos
LTC-GIF: Attracting More Clicks on Feature-length Sports Videos
Ghulam Mujtaba
Jaehyuk Choi
Eun‐Seok Ryu
44
0
0
22 Jan 2022
VIPriors 2: Visual Inductive Priors for Data-Efficient Deep Learning
  Challenges
VIPriors 2: Visual Inductive Priors for Data-Efficient Deep Learning Challenges
A. Lengyel
Robert-Jan Bruintjes
Marcos Baptista-Rios
O. Kayhan
Davide Zambrano
Nergis Tomen
Jan van Gemert
VLM
78
11
0
21 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
Laurens van der Maaten
Armand Joulin
Ishan Misra
292
237
0
20 Jan 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
89
170
0
20 Jan 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
3DGS
116
41
0
20 Jan 2022
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine
  Translation
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation
Yihang Li
Shuichiro Shimizu
Weiqi Gu
Chenhui Chu
Sadao Kurohashi
57
15
0
20 Jan 2022
Action Keypoint Network for Efficient Video Recognition
Action Keypoint Network for Efficient Video Recognition
Xu Chen
Yahong Han
Xiaohan Wang
Yifang Sun
Yi Yang
3DPC
82
6
0
17 Jan 2022
Continual Transformers: Redundancy-Free Attention for Online Inference
Continual Transformers: Redundancy-Free Attention for Online Inference
Lukas Hedegaard
Arian Bakhtiarnia
Alexandros Iosifidis
CLL
110
12
0
17 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
143
107
0
16 Jan 2022
Towards Zero-shot Sign Language Recognition
Towards Zero-shot Sign Language Recognition
Yunus Can Bilge
R. G. Cinbis
Nazli Ikizler-Cinbis
SLR
58
36
0
15 Jan 2022
Learning Temporally and Semantically Consistent Unpaired Video-to-video
  Translation Through Pseudo-Supervision From Synthetic Optical Flow
Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow
Kaihong Wang
Kumar Akash
Teruhisa Misu
83
11
0
15 Jan 2022
Transformers in Action: Weakly Supervised Action Segmentation
Transformers in Action: Weakly Supervised Action Segmentation
John Ridley
Huseyin Coskun
D. Tan
Nassir Navab
F. Tombari
ViT
48
5
0
14 Jan 2022
Hand-Object Interaction Reasoning
Hand-Object Interaction Reasoning
Jian Ma
Dima Damen
71
7
0
13 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal
  Representation Learning
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
145
254
0
12 Jan 2022
OCSampler: Compressing Videos to One Clip with Single-step Sampling
OCSampler: Compressing Videos to One Clip with Single-step Sampling
Jintao Lin
Haodong Duan
Kai-xiang Chen
Dahua Lin
Limin Wang
78
24
0
12 Jan 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
99
221
0
12 Jan 2022
Motion-Focused Contrastive Learning of Video Representations
Motion-Focused Contrastive Learning of Video Representations
Rui Li
Yiheng Zhang
Zhaofan Qiu
Ting Yao
Dong Liu
Tao Mei
SSL
92
35
0
11 Jan 2022
Representing Videos as Discriminative Sub-graphs for Action Recognition
Representing Videos as Discriminative Sub-graphs for Action Recognition
Dong Li
Zhaofan Qiu
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
108
26
0
11 Jan 2022
Boosting Video Representation Learning with Multi-Faceted Integration
Boosting Video Representation Learning with Multi-Faceted Integration
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Xiaoping Zhang
Dong Wu
Tao Mei
63
9
0
11 Jan 2022
Condensing a Sequence to One Informative Frame for Video Recognition
Condensing a Sequence to One Informative Frame for Video Recognition
Zhaofan Qiu
Ting Yao
Y. Shu
Chong-Wah Ngo
Tao Mei
152
9
0
11 Jan 2022
Optimization Planning for 3D ConvNets
Optimization Planning for 3D ConvNets
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
3DPC3DH
91
9
0
11 Jan 2022
TSA-Net: Tube Self-Attention Network for Action Quality Assessment
TSA-Net: Tube Self-Attention Network for Action Quality Assessment
Shunli Wang
Dingkang Yang
Peng Zhai
Chixiao Chen
Lihua Zhang
ViT
81
70
0
11 Jan 2022
Previous
123...383940...717273
Next