Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
v1
v2
v3 (latest)
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 3,647 papers shown
Title
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
Rex Liu
Huan Zhang
Hamed Pirsiavash
Xin Liu
ViT
92
13
0
08 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
96
6
0
08 Dec 2021
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
VPVLM
VLM
139
384
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
132
135
0
08 Dec 2021
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs
Kaifeng Gao
Long Chen
Yulei Niu
Jian Shao
Jun Xiao
68
29
0
08 Dec 2021
SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization
Wenbo Gou
Wen Shi
Jian Lou
Lijie Huang
Pan Zhou
Ruixuan Li
AAML
74
2
0
08 Dec 2021
Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
Srijan Das
Michael S. Ryoo
SSL
48
0
0
07 Dec 2021
ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints
Srijan Das
Michael S. Ryoo
SSL
90
20
0
07 Dec 2021
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Rui Dai
Srijan Das
Kumara Kahatapitiya
Michael S. Ryoo
Francois Bremond
ViT
109
73
0
07 Dec 2021
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Manlin Zhang
Jinpeng Wang
A. J. Ma
85
9
0
07 Dec 2021
Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection
Shoubin Yu
Zhong-Hua Zhao
Haoshu Fang
Andong Deng
Haisheng Su
Dongliang Wang
Weihao Gan
Cewu Lu
Wei Wu
99
19
0
07 Dec 2021
DCAN: Improving Temporal Action Detection via Dual Context Aggregation
Guo Chen
Yin-Dong Zheng
Limin Wang
Tong Lu
AI4TS
139
74
0
07 Dec 2021
E
2
^2
2
(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
Chiara Plizzari
M. Planamente
Gabriele Goletto
Marco Cannici
Emanuele Gusso
Matteo Matteucci
Barbara Caputo
EgoV
104
57
0
07 Dec 2021
STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
Zhaoqilin Yang
Gaoyun An
69
5
0
05 Dec 2021
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Tengjiao Wang
Yu Qiao
Peng Gao
Hongsheng Li
VLM
3DPC
271
453
0
04 Dec 2021
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection
Youwei Pang
Xiaoqi Zhao
Lihe Zhang
Huchuan Lu
94
98
0
04 Dec 2021
Gesture Recognition with a Skeleton-Based Keyframe Selection Module
Yunsoo Kim
Hyun Myung
SLR
68
1
0
03 Dec 2021
BEVT: BERT Pretraining of Video Transformers
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu-Gang Jiang
Luowei Zhou
Lu Yuan
ViT
120
209
0
02 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
172
702
0
02 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLM
CLIP
232
584
0
02 Dec 2021
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Michael S. Ryoo
ViT
144
88
0
02 Dec 2021
Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation
Dipika Singhania
R. Rahaman
Angela Yao
86
24
0
02 Dec 2021
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
103
24
0
02 Dec 2021
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips
Lijin Yang
Yifei Huang
Yusuke Sugano
Yoichi Sato
111
5
0
02 Dec 2021
Vision Pair Learning: An Efficient Training Framework for Image Classification
Bei Tong
Xiaoyuan Yu
ViT
56
0
0
02 Dec 2021
PreViTS: Contrastive Pretraining with Video Tracking Supervision
Brian Chen
Ramprasaath R. Selvaraju
Shih-Fu Chang
Juan Carlos Niebles
Nikhil Naik
ViT
85
2
0
01 Dec 2021
Routing with Self-Attention for Multimodal Capsule Networks
Kevin Duarte
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
Samuel Thomas
Alexander H. Liu
David Harwath
James R. Glass
Hilde Kuehne
M. Shah
SSL
59
5
0
01 Dec 2021
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano
Aaqib Saeed
94
7
0
01 Dec 2021
Graph Convolutional Module for Temporal Action Localization in Videos
Runhao Zeng
Wenbing Huang
Mingkui Tan
Yu Rong
P. Zhao
Junzhou Huang
Chuang Gan
87
66
0
01 Dec 2021
Affect-DML: Context-Aware One-Shot Recognition of Human Affect using Deep Metric Learning
Kunyu Peng
Alina Roitberg
David Schneider
Marios Koulakis
Kailun Yang
Rainer Stiefelhagen
CVBM
77
3
0
30 Nov 2021
Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning
Hanbyel Cho
Yooshin Cho
Jaemyung Yu
Junmo Kim
3DH
41
15
0
30 Nov 2021
End-to-End Referring Video Object Segmentation with Multimodal Transformers
Adam Botach
Evgenii Zheltonozhskii
Chaim Baskin
VOS
115
150
0
29 Nov 2021
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang
Zi-yi Liu
N. Zheng
87
14
0
29 Nov 2021
Automated Detection of Patients in Hospital Video Recordings
Siddharth Sharma
Florian Dubost
Christopher Lee-Messer
D. Rubin
23
2
0
28 Nov 2021
Weakly-guided Self-supervised Pretraining for Temporal Activity Detection
Kumara Kahatapitiya
Zhou Ren
Haoxiang Li
Zhenyu Wu
Michael S. Ryoo
G. Hua
ViT
63
7
0
26 Nov 2021
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
89
53
0
25 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
93
247
0
25 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
110
75
0
25 Nov 2021
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
81
27
0
25 Nov 2021
Cross Your Body: A Cognitive Assessment System for Children
S. Sayed
V. Athitsos
23
2
0
24 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
157
221
0
24 Nov 2021
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
David Junhao Zhang
Kunchang Li
Yali Wang
Yuxiang Chen
Shashwat Chandra
Yu Qiao
Luoqi Liu
Mike Zheng Shou
AI4TS
100
30
0
24 Nov 2021
Background-Click Supervision for Temporal Action Localization
Le Yang
Junwei Han
Tao Zhao
Tianwei Lin
Dingwen Zhang
Jianxin Chen
92
61
0
24 Nov 2021
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Sai Rajeswar
Pau Rodríguez López
Soumye Singhal
David Vazquez
Rameswar Panda
VLM
88
31
0
23 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip Torr
Guoying Zhao
ViT
MedIm
213
174
0
23 Nov 2021
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
60
7
0
23 Nov 2021
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis
Zhaobo Qi
Shuhui Wang
Chi Su
Li Su
Weigang Zhang
Qingming Huang
61
10
0
23 Nov 2021
Self-Regulated Learning for Egocentric Video Activity Anticipation
Zhaobo Qi
Shuhui Wang
Chi Su
Li Su
Qingming Huang
Q. Tian
EgoV
117
52
0
23 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
68
67
0
23 Nov 2021
Auto-Encoding Score Distribution Regression for Action Quality Assessment
Boyu Zhang
Jiayuan Chen
Yinfei Xu
Hui Zhang
Xu Yang
Xin Geng
107
28
0
22 Nov 2021
Previous
1
2
3
...
40
41
42
...
71
72
73
Next