Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 1,513 papers shown
Title
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
105
94
0
04 Jul 2022
OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification
Ye Liu
Lingfeng Qiao
Di Yin
Zhuoxuan Jiang
Xinghua Jiang
Deqiang Jiang
Bo Ren
26
7
0
04 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
45
2
0
02 Jul 2022
Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation
Yang Zhao
Yan Song
25
3
0
02 Jul 2022
Video + CLIP Baseline for Ego4D Long-term Action Anticipation
Srijan Das
Michael S. Ryoo
VLM
CLIP
29
17
0
01 Jul 2022
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision
Sanat Ramesh
V. Srivastav
Deepak Alapatt
Tong Yu
Aditya Murali
...
Saurav Sharma
A. Fleurentin
Georgios Exarchakis
Alexandros Karargyris
N. Padoy
41
43
0
01 Jul 2022
COVID Detection and Severity Prediction with 3D-ConvNeXt and Custom Pretrainings
Daniel Kienzle
Julian Lorenz
Robin Schon
K. Ludwig
Rainer Lienhart
3DPC
40
14
0
30 Jun 2022
Timestamp-Supervised Action Segmentation with Graph Convolutional Networks
Hamza Khan
S. Haresh
Awais Ahmed
Shakeeb Siddiqui
Andrey Konin
Mohammad Zeeshan
Quoc-Huy Tran
27
22
0
30 Jun 2022
Programmatic Concept Learning for Human Motion Description and Synthesis
Sumith Kulal
Jiayuan Mao
A. Aiken
Jiajun Wu
33
7
0
27 Jun 2022
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
Kashu Yamazaki
Sang Truong
Khoa T. Vo
Michael Kidd
Chase Rainwater
Khoa Luu
Ngan Le
VLM
CoGe
15
25
0
26 Jun 2022
Video Activity Localisation with Uncertainties in Temporal Boundary
Jiabo Huang
Hailin Jin
S. Gong
Yang Liu
29
23
0
26 Jun 2022
Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization
Kun Xia
Le Wang
Sanping Zhou
Nanning Zheng
Wei Tang
44
36
0
23 Jun 2022
Motion Gait: Gait Recognition via Motion Excitation
Yunpeng Zhang
Zhengyou Wang
Shanna Zhuang
Hui Wang
CVBM
21
1
0
22 Jun 2022
Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning
Jiachen Du
Jialuo Feng
Kun-Yu Lin
Fa-Ting Hong
Xiao-Ming Wu
Zhongang Qi
Ying Shan
Weihao Zheng
52
5
0
22 Jun 2022
Context-aware Proposal Network for Temporal Action Detection
Xiang Wang
Han Zhang
Shiwei Zhang
Changxin Gao
Yuanjie Shao
Nong Sang
24
2
0
18 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
45
132
0
18 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
73
354
0
17 Jun 2022
Going Deeper than Tracking: a Survey of Computer-Vision Based Recognition of Animal Pain and Affective States
Sofia Broomé
Marcelo Feighelstein
Anna Zamansky
Gabriel Carreira Lencioni
P. Andersen
Francisca Pessanha
M. Mahmoud
Hedvig Kjellström
A. A. Salah
43
11
0
16 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
45
98
0
16 Jun 2022
Analysis and Extensions of Adversarial Training for Video Classification
K. A. Kinfu
René Vidal
AAML
33
13
0
16 Jun 2022
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Shanghua Gao
Zhong-Yu Li
Qi Han
Ming-Ming Cheng
Liang Wang
44
35
0
14 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
42
15
0
13 Jun 2022
NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression Recognition
Hanting Li
Ming-Fa Sui
Zhaoqing Zhu
Feng Zhao
32
27
0
10 Jun 2022
GateHUB: Gated History Unit with Background Suppression for Online Action Detection
Junwen Chen
Gaurav Mittal
Ye Yu
Yu Kong
Mei Chen
54
33
0
09 Jun 2022
Spatial-temporal Concept based Explanation of 3D ConvNets
Yi Ji
Yu Wang
K. Mori
Jien Kato
3DPC
FAtt
32
7
0
09 Jun 2022
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
Zihan Ding
Tianrui Hui
Junshi Huang
Xiaoming Wei
Jizhong Han
Si Liu
VOS
33
52
0
08 Jun 2022
Generating Long Videos of Dynamic Scenes
Tim Brooks
Janne Hellsten
M. Aittala
Ting-Chun Wang
Timo Aila
J. Lehtinen
Xuan Li
Alexei A. Efros
Tero Karras
SyDa
9
101
0
07 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
24
111
0
07 Jun 2022
A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information
M. Kowal
Mennatullah Siam
Md. Amirul Islam
Neil D. B. Bruce
Richard P. Wildes
Konstantinos G. Derpanis
26
25
0
06 Jun 2022
3D Convolutional with Attention for Action Recognition
Labina Shrestha
Shikha Dubey
Farrukh Olimov
M. Rafique
M. Jeon
29
0
0
05 Jun 2022
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation
Mingjie Li
Wenjia Cai
Karin Verspoor
Shirui Pan
Xiaodan Liang
Xiaojun Chang
MedIm
41
35
0
04 Jun 2022
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
41
158
0
03 Jun 2022
Egocentric Video-Language Pretraining
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Guohao Li
Wei Liu
Mike Zheng Shou
VLM
EgoV
46
191
0
03 Jun 2022
A temporal chrominance trigger for clean-label backdoor attack against anti-spoof rebroadcast detection
Wei Guo
B. Tondi
Mauro Barni
AAML
24
13
0
02 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
47
53
0
02 Jun 2022
Cascaded Video Generation for Videos In-the-Wild
Lluis Castrejon
Nicolas Ballas
Aaron Courville
VGen
37
0
0
01 Jun 2022
Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines
Camilo Luciano Fosco
Emilie Josephs
A. Andonian
Allen Lee
Xi Wang
A. Oliva
49
4
0
01 Jun 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
261
571
0
29 May 2022
Micro-Expression Recognition Based on Attribute Information Embedding and Cross-modal Contrastive Learning
Yanxing Song
Jianzong Wang
Tianbo Wu
Zhangcheng Huang
Jing Xiao
CVBM
42
2
0
29 May 2022
Do we really need temporal convolutions in action segmentation?
Dazhao Du
Fuchun Sun
Yu Li
Zhongang Qi
Hui Xiong
Ying Shan
ViT
39
16
0
26 May 2022
Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Manuel Traub
S. Otte
Tobias Menge
Matthias Karlbauer
Jannik Thummel
Martin Volker Butz
41
20
0
26 May 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
31
59
0
22 May 2022
Structured Attention Composition for Temporal Action Localization
Le Yang
Junwei Han
Tao Zhao
Nian Liu
Dingwen Zhang
42
17
0
20 May 2022
Cross-Enhancement Transformer for Action Segmentation
Jiahui Wang
Zhenyou Wang
Shanna Zhuang
Hui Wang
ViT
59
23
0
19 May 2022
PYSKL: Towards Good Practices for Skeleton Action Recognition
Haodong Duan
Jiaqi Wang
Kai-xiang Chen
Dahua Lin
VLM
35
137
0
19 May 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
132
62
0
17 May 2022
ETAD: Training Action Detection End to End on a Laptop
Shuming Liu
Mengmeng Xu
Chen Zhao
Xu Zhao
Guohao Li
44
6
0
14 May 2022
Scaling up sign spotting through sign language dictionaries
Gül Varol
Liliane Momeni
Samuel Albanie
Triantafyllos Afouras
Andrew Zisserman
29
14
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
21
122
0
08 May 2022
An Empirical Study on Activity Recognition in Long Surgical Videos
Zhuohong He
A. Mottaghi
Aidean Sharghi
Muhammad Abdullah Jamal
Omid Mohareri
36
12
0
05 May 2022
Previous
1
2
3
...
11
12
13
...
29
30
31
Next