ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07750
  4. Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
v1v2v3 (latest)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017
João Carreira
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 3,647 papers shown
Title
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for
  Few-shot Video Classification
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
Rex Liu
Huan Zhang
Hamed Pirsiavash
Xin Liu
ViT
92
13
0
08 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
96
6
0
08 Dec 2021
Prompting Visual-Language Models for Efficient Video Understanding
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
VPVLMVLM
139
384
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
132
135
0
08 Dec 2021
Classification-Then-Grounding: Reformulating Video Scene Graphs as
  Temporal Bipartite Graphs
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs
Kaifeng Gao
Long Chen
Yulei Niu
Jian Shao
Jun Xiao
68
29
0
08 Dec 2021
SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language
  Video Localization
SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization
Wenbo Gou
Wen Shi
Jian Lou
Lijie Huang
Pan Zhou
Ruixuan Li
AAML
74
2
0
08 Dec 2021
Cross-modal Manifold Cutmix for Self-supervised Video Representation
  Learning
Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
Srijan Das
Michael S. Ryoo
SSL
48
0
0
07 Dec 2021
ViewCLR: Learning Self-supervised Video Representation for Unseen
  Viewpoints
ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints
Srijan Das
Michael S. Ryoo
SSL
90
20
0
07 Dec 2021
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Rui Dai
Srijan Das
Kumara Kahatapitiya
Michael S. Ryoo
Francois Bremond
ViT
109
73
0
07 Dec 2021
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised
  Video Representation Learning
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Manlin Zhang
Jinpeng Wang
A. J. Ma
85
9
0
07 Dec 2021
Regularity Learning via Explicit Distribution Modeling for Skeletal
  Video Anomaly Detection
Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection
Shoubin Yu
Zhong-Hua Zhao
Haoshu Fang
Andong Deng
Haisheng Su
Dongliang Wang
Weihao Gan
Cewu Lu
Wei Wu
99
19
0
07 Dec 2021
DCAN: Improving Temporal Action Detection via Dual Context Aggregation
DCAN: Improving Temporal Action Detection via Dual Context Aggregation
Guo Chen
Yin-Dong Zheng
Limin Wang
Tong Lu
AI4TS
139
74
0
07 Dec 2021
E$^2$(GO)MOTION: Motion Augmented Event Stream for Egocentric Action
  Recognition
E2^22(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
Chiara Plizzari
M. Planamente
Gabriele Goletto
Marco Cannici
Emanuele Gusso
Matteo Matteucci
Barbara Caputo
EgoV
104
57
0
07 Dec 2021
STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
Zhaoqilin Yang
Gaoyun An
69
5
0
05 Dec 2021
PointCLIP: Point Cloud Understanding by CLIP
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Tengjiao Wang
Yu Qiao
Peng Gao
Hongsheng Li
VLM3DPC
271
453
0
04 Dec 2021
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object
  Detection
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection
Youwei Pang
Xiaoqi Zhao
Lihe Zhang
Huchuan Lu
94
98
0
04 Dec 2021
Gesture Recognition with a Skeleton-Based Keyframe Selection Module
Gesture Recognition with a Skeleton-Based Keyframe Selection Module
Yunsoo Kim
Hyun Myung
SLR
68
1
0
03 Dec 2021
BEVT: BERT Pretraining of Video Transformers
BEVT: BERT Pretraining of Video Transformers
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu-Gang Jiang
Luowei Zhou
Lu Yuan
ViT
120
209
0
02 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
172
702
0
02 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLMCLIP
232
584
0
02 Dec 2021
Self-supervised Video Transformer
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Michael S. Ryoo
ViT
144
88
0
02 Dec 2021
Iterative Contrast-Classify For Semi-supervised Temporal Action
  Segmentation
Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation
Dipika Singhania
R. Rahaman
Angela Yao
86
24
0
02 Dec 2021
Video-Text Pre-training with Learned Regions
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
103
24
0
02 Dec 2021
Stacked Temporal Attention: Improving First-person Action Recognition by
  Emphasizing Discriminative Clips
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips
Lijin Yang
Yifei Huang
Yusuke Sugano
Yoichi Sato
111
5
0
02 Dec 2021
Vision Pair Learning: An Efficient Training Framework for Image
  Classification
Vision Pair Learning: An Efficient Training Framework for Image Classification
Bei Tong
Xiaoyuan Yu
ViT
56
0
0
02 Dec 2021
PreViTS: Contrastive Pretraining with Video Tracking Supervision
PreViTS: Contrastive Pretraining with Video Tracking Supervision
Brian Chen
Ramprasaath R. Selvaraju
Shih-Fu Chang
Juan Carlos Niebles
Nikhil Naik
ViT
85
2
0
01 Dec 2021
Routing with Self-Attention for Multimodal Capsule Networks
Routing with Self-Attention for Multimodal Capsule Networks
Kevin Duarte
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
Samuel Thomas
Alexander H. Liu
David Harwath
James R. Glass
Hilde Kuehne
M. Shah
SSL
59
5
0
01 Dec 2021
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from
  a Single Image
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano
Aaqib Saeed
94
7
0
01 Dec 2021
Graph Convolutional Module for Temporal Action Localization in Videos
Graph Convolutional Module for Temporal Action Localization in Videos
Runhao Zeng
Wenbing Huang
Mingkui Tan
Yu Rong
P. Zhao
Junzhou Huang
Chuang Gan
87
66
0
01 Dec 2021
Affect-DML: Context-Aware One-Shot Recognition of Human Affect using
  Deep Metric Learning
Affect-DML: Context-Aware One-Shot Recognition of Human Affect using Deep Metric Learning
Kunyu Peng
Alina Roitberg
David Schneider
Marios Koulakis
Kailun Yang
Rainer Stiefelhagen
CVBM
77
3
0
30 Nov 2021
Camera Distortion-aware 3D Human Pose Estimation in Video with
  Optimization-based Meta-Learning
Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning
Hanbyel Cho
Yooshin Cho
Jaemyung Yu
Junmo Kim
3DH
41
15
0
30 Nov 2021
End-to-End Referring Video Object Segmentation with Multimodal
  Transformers
End-to-End Referring Video Object Segmentation with Multimodal Transformers
Adam Botach
Evgenii Zheltonozhskii
Chaim Baskin
VOS
115
150
0
29 Nov 2021
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video
  Question Answering
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang
Zi-yi Liu
N. Zheng
87
14
0
29 Nov 2021
Automated Detection of Patients in Hospital Video Recordings
Automated Detection of Patients in Hospital Video Recordings
Siddharth Sharma
Florian Dubost
Christopher Lee-Messer
D. Rubin
23
2
0
28 Nov 2021
Weakly-guided Self-supervised Pretraining for Temporal Activity
  Detection
Weakly-guided Self-supervised Pretraining for Temporal Activity Detection
Kumara Kahatapitiya
Zhou Ren
Haoxiang Li
Zhenyu Wu
Michael S. Ryoo
G. Hua
ViT
63
7
0
26 Nov 2021
Learning from Temporal Gradient for Semi-supervised Action Recognition
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
89
53
0
25 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video
  Captioning
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
93
247
0
25 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
110
75
0
25 Nov 2021
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
81
27
0
25 Nov 2021
Cross Your Body: A Cognitive Assessment System for Children
Cross Your Body: A Cognitive Assessment System for Children
S. Sayed
V. Athitsos
23
2
0
24 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
157
221
0
24 Nov 2021
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal
  Representation Learning
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
David Junhao Zhang
Kunchang Li
Yali Wang
Yuxiang Chen
Shashwat Chandra
Yu Qiao
Luoqi Liu
Mike Zheng Shou
AI4TS
100
30
0
24 Nov 2021
Background-Click Supervision for Temporal Action Localization
Background-Click Supervision for Temporal Action Localization
Le Yang
Junwei Han
Tao Zhao
Tianwei Lin
Dingwen Zhang
Jianxin Chen
92
61
0
24 Nov 2021
Multi-label Iterated Learning for Image Classification with Label
  Ambiguity
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Sai Rajeswar
Pau Rodríguez López
Soumye Singhal
David Vazquez
Rameswar Panda
VLM
88
31
0
23 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal
  Difference Transformer
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip Torr
Guoying Zhao
ViTMedIm
213
174
0
23 Nov 2021
Sparse Fusion for Multimodal Transformers
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
60
7
0
23 Nov 2021
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed
  Video Analysis
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis
Zhaobo Qi
Shuhui Wang
Chi Su
Li Su
Weigang Zhang
Qingming Huang
61
10
0
23 Nov 2021
Self-Regulated Learning for Egocentric Video Activity Anticipation
Self-Regulated Learning for Egocentric Video Activity Anticipation
Zhaobo Qi
Shuhui Wang
Chi Su
Li Su
Qingming Huang
Q. Tian
EgoV
117
52
0
23 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
68
67
0
23 Nov 2021
Auto-Encoding Score Distribution Regression for Action Quality
  Assessment
Auto-Encoding Score Distribution Regression for Action Quality Assessment
Boyu Zhang
Jiayuan Chen
Yinfei Xu
Hui Zhang
Xu Yang
Xin Geng
107
28
0
22 Nov 2021
Previous
123...404142...717273
Next