ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07750
  4. Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
v1v2v3 (latest)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017
João Carreira
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 3,647 papers shown
Title
OS-MSL: One Stage Multimodal Sequential Link Framework for Scene
  Segmentation and Classification
OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification
Ye Liu
Lingfeng Qiao
Di Yin
Zhuoxuan Jiang
Xinghua Jiang
Deqiang Jiang
Bo Ren
54
7
0
04 Jul 2022
Continuous Sign Language Recognition via Temporal Super-Resolution
  Network
Continuous Sign Language Recognition via Temporal Super-Resolution Network
Qidan Zhu
Jing Li
Fei Yuan
Quan Gan
SLR
53
12
0
03 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context
  Augmented Dialogue System: A Review
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
79
2
0
02 Jul 2022
Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video
  Grounding
Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding
Zeyu Xiong
Daizong Liu
Technology
37
8
0
02 Jul 2022
Turning to a Teacher for Timestamp Supervised Temporal Action
  Segmentation
Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation
Yang Zhao
Yan Song
120
4
0
02 Jul 2022
Video + CLIP Baseline for Ego4D Long-term Action Anticipation
Video + CLIP Baseline for Ego4D Long-term Action Anticipation
Srijan Das
Michael S. Ryoo
VLMCLIP
70
17
0
01 Jul 2022
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision
Sanat Ramesh
V. Srivastav
Deepak Alapatt
Tong Yu
Aditya Murali
...
Saurav Sharma
A. Fleurentin
Georgios Exarchakis
Alexandros Karargyris
N. Padoy
134
46
0
01 Jul 2022
COVID Detection and Severity Prediction with 3D-ConvNeXt and Custom
  Pretrainings
COVID Detection and Severity Prediction with 3D-ConvNeXt and Custom Pretrainings
Daniel Kienzle
Julian Lorenz
Robin Schon
K. Ludwig
Rainer Lienhart
3DPC
69
14
0
30 Jun 2022
Timestamp-Supervised Action Segmentation with Graph Convolutional
  Networks
Timestamp-Supervised Action Segmentation with Graph Convolutional Networks
Hamza Khan
S. Haresh
Awais Ahmed
Shakeeb Siddiqui
Andrey Konin
Mohammad Zeeshan
Quoc-Huy Tran
94
23
0
30 Jun 2022
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Taeoh Kim
Jinhyung Kim
Minho Shim
Sangdoo Yun
Myunggu Kang
Dongyoon Wee
Sangyoun Lee
AI4TS
128
10
0
30 Jun 2022
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
Junting Pan
Ziyi Lin
Xiatian Zhu
Jing Shao
Hongsheng Li
109
207
0
27 Jun 2022
Programmatic Concept Learning for Human Motion Description and Synthesis
Programmatic Concept Learning for Human Motion Description and Synthesis
Sumith Kulal
Jiayuan Mao
A. Aiken
Jiajun Wu
116
8
0
27 Jun 2022
VLCap: Vision-Language with Contrastive Learning for Coherent Video
  Paragraph Captioning
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
Kashu Yamazaki
Sang Truong
Khoa T. Vo
Michael Kidd
Chase Rainwater
Khoa Luu
Ngan Le
VLMCoGe
65
26
0
26 Jun 2022
Video Activity Localisation with Uncertainties in Temporal Boundary
Video Activity Localisation with Uncertainties in Temporal Boundary
Jiabo Huang
Hailin Jin
S. Gong
Yang Liu
108
24
0
26 Jun 2022
Semantic Role Aware Correlation Transformer for Text to Video Retrieval
Semantic Role Aware Correlation Transformer for Text to Video Retrieval
Burak Satar
Erik Cambria
Xavier Bresson
J. Lim
ViT
36
10
0
26 Jun 2022
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video
  Retrieval
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval
Burak Satar
Erik Cambria
Hanwang Zhang
J. Lim
79
11
0
26 Jun 2022
SLIC: Self-Supervised Learning with Iterative Clustering for Human
  Action Videos
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos
S. H. Khorasgani
Yuxuan Chen
Florian Shkurti
SSL
114
24
0
25 Jun 2022
Learning to Refactor Action and Co-occurrence Features for Temporal
  Action Localization
Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization
Kun Xia
Le Wang
Sanping Zhou
Nanning Zheng
Wei Tang
99
38
0
23 Jun 2022
Explore Spatio-temporal Aggregation for Insubstantial Object Detection:
  Benchmark Dataset and Baseline
Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline
Kailai Zhou
Yibo Wang
Tao Lv
Yunqian Li
Linsen Chen
Qiu Shen
Xun Cao
74
11
0
23 Jun 2022
Motion Gait: Gait Recognition via Motion Excitation
Motion Gait: Gait Recognition via Motion Excitation
Yunpeng Zhang
Zhengyou Wang
Shanna Zhuang
Hui Wang
CVBM
48
1
0
22 Jun 2022
Weakly-Supervised Temporal Action Localization by Progressive
  Complementary Learning
Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning
Jiachen Du
Jialuo Feng
Kun-Yu Lin
Fa-Ting Hong
Xiao-Ming Wu
Zhongang Qi
Ying Shan
Weihao Zheng
105
5
0
22 Jun 2022
Symmetric Network with Spatial Relationship Modeling for Natural
  Language-based Vehicle Retrieval
Symmetric Network with Spatial Relationship Modeling for Natural Language-based Vehicle Retrieval
Chuyang Zhao
Haobo Chen
Wenyuan Zhang
Junru Chen
Sipeng Zhang
Yadong Li
Boxun Li
71
10
0
22 Jun 2022
Bi-Calibration Networks for Weakly-Supervised Video Representation
  Learning
Bi-Calibration Networks for Weakly-Supervised Video Representation Learning
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
77
6
0
21 Jun 2022
Probing Visual-Audio Representation for Video Highlight Detection via
  Hard-Pairs Guided Contrastive Learning
Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive Learning
Shuaicheng Li
Feng Zhang
Kunlin Yang
Lin-Na Liu
Shinan Liu
Jun Hou
Shuai Yi
103
9
0
21 Jun 2022
KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute
  Parsing
KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing
Xuanhan Wang
Jingkuan Song
Xiaojia Chen
Lechao Cheng
Lianli Gao
Hengtao Shen
80
9
0
21 Jun 2022
Pyramid Region-based Slot Attention Network for Temporal Action Proposal
  Generation
Pyramid Region-based Slot Attention Network for Temporal Action Proposal Generation
Shuaicheng Li
Feng Zhang
Ruiwei Zhao
Rui Feng
Kunlin Yang
Lin-Na Liu
Jun Hou
ViT
84
5
0
21 Jun 2022
DisCoVQA: Temporal Distortion-Content Transformers for Video Quality
  Assessment
DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment
Haoning Wu
Chao-Yu Chen
Liang Liao
Jingwen Hou
Wenxiu Sun
Qiong Yan
Weisi Lin
ViT
77
53
0
20 Jun 2022
M&M Mix: A Multimodal Multiview Transformer Ensemble
M&M Mix: A Multimodal Multiview Transformer Ensemble
Xuehan Xiong
Anurag Arnab
Arsha Nagrani
Cordelia Schmid
ViT
70
20
0
20 Jun 2022
Context-aware Proposal Network for Temporal Action Detection
Context-aware Proposal Network for Temporal Action Detection
Xiang Wang
Han Zhang
Shiwei Zhang
Changxin Gao
Yuanjie Shao
Nong Sang
68
2
0
18 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
143
136
0
18 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
  Knowledge
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
171
388
0
17 Jun 2022
Learning Using Privileged Information for Zero-Shot Action Recognition
Learning Using Privileged Information for Zero-Shot Action Recognition
Zhiyi Gao
Yonghong Hou
Wanqing Li
Zihui Guo
Ting Yu
50
2
0
17 Jun 2022
Scalable Temporal Localization of Sensitive Activities in Movies and TV
  Episodes
Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes
Xiang Hao
Jingxiang Chen
Shixing Chen
Ahmed Saad
Raffay Hamid
AI4TS
109
0
0
16 Jun 2022
Going Deeper than Tracking: a Survey of Computer-Vision Based
  Recognition of Animal Pain and Affective States
Going Deeper than Tracking: a Survey of Computer-Vision Based Recognition of Animal Pain and Affective States
Sofia Broomé
Marcelo Feighelstein
Anna Zamansky
Gabriel Carreira Lencioni
P. Andersen
Francisca Pessanha
M. Mahmoud
Hedvig Kjellström
A. A. Salah
87
11
0
16 Jun 2022
Learning Generic Lung Ultrasound Biomarkers for Decoupling Feature
  Extraction from Downstream Tasks
Learning Generic Lung Ultrasound Biomarkers for Decoupling Feature Extraction from Downstream Tasks
Gautam Rajendrakumar Gare
Thomas H. Fox
P. Lowery
K. Zamora
H. V. Tran
...
Amita Krishnan
Deva Ramanan
R. Rodriguez
Bennett P deBoisblanc
J. Galeotti
68
3
0
16 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
120
99
0
16 Jun 2022
Backbones-Review: Feature Extraction Networks for Deep Learning and Deep
  Reinforcement Learning Approaches
Backbones-Review: Feature Extraction Networks for Deep Learning and Deep Reinforcement Learning Approaches
O. Elharrouss
Y. Akbari
Noor Almaadeed
S. Al-Maadeed
88
75
0
16 Jun 2022
Analysis and Extensions of Adversarial Training for Video Classification
Analysis and Extensions of Adversarial Training for Video Classification
K. A. Kinfu
René Vidal
AAML
93
13
0
16 Jun 2022
Discrete Contrastive Diffusion for Cross-Modal Music and Image
  Generation
Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation
Ye Zhu
Yuehua Wu
Kyle Olszewski
Jian Ren
Sergey Tulyakov
Yan Yan
DiffM
109
49
0
15 Jun 2022
Stand-Alone Inter-Frame Attention in Video Models
Stand-Alone Inter-Frame Attention in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Jiebo Luo
Tao Mei
ViT
69
47
0
14 Jun 2022
RF-Next: Efficient Receptive Field Search for Convolutional Neural
  Networks
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Shanghua Gao
Zhong-Yu Li
Qi Han
Ming-Ming Cheng
Liang Wang
106
35
0
14 Jun 2022
Med-DANet: Dynamic Architecture Network for Efficient Medical Volumetric
  Segmentation
Med-DANet: Dynamic Architecture Network for Efficient Medical Volumetric Segmentation
Wenxuan Wang
Chen Chen
Jing Wang
Sen Zha
Yan Zhang
Jiangyun Li
MedIm
75
10
0
14 Jun 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of
  Object Tokens
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben-Avraham
Roei Herzig
K. Mangalam
Amir Bar
Anna Rohrbach
Leonid Karlinsky
Trevor Darrell
Amir Globerson
94
0
0
13 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
79
15
0
13 Jun 2022
DRNet: Decomposition and Reconstruction Network for Remote Physiological
  Measurement
DRNet: Decomposition and Reconstruction Network for Remote Physiological Measurement
Yuhang Dong
Gongping Yang
Yilong Yin
50
4
0
12 Jun 2022
Lost in Transmission: On the Impact of Networking Corruptions on Video
  Machine Learning Models
Lost in Transmission: On the Impact of Networking Corruptions on Video Machine Learning Models
Trenton Chang
Daniel Y. Fu
37
0
0
10 Jun 2022
NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression
  Recognition
NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression Recognition
Hanting Li
Ming-Fa Sui
Zhaoqing Zhu
Feng Zhao
72
29
0
10 Jun 2022
Learn2Augment: Learning to Composite Videos for Data Augmentation in
  Action Recognition
Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition
Shreyank N. Gowda
Marcus Rohrbach
Frank Keller
Laura Sevilla-Lara
110
37
0
09 Jun 2022
GateHUB: Gated History Unit with Background Suppression for Online
  Action Detection
GateHUB: Gated History Unit with Background Suppression for Online Action Detection
Junwen Chen
Gaurav Mittal
Ye Yu
Yu Kong
Mei Chen
93
37
0
09 Jun 2022
VideoINR: Learning Video Implicit Neural Representation for Continuous
  Space-Time Super-Resolution
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
Zeyuan Chen
Yinbo Chen
Jingwen Liu
Xingqian Xu
Vidit Goel
Zhangyang Wang
Humphrey Shi
Xiaolong Wang
SupR
121
111
0
09 Jun 2022
Previous
123...333435...717273
Next