Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.01340
Cited By
A Short Note about Kinetics-600
3 August 2018
João Carreira
Eric Noland
Andras Banki-Horvath
Chloe Hillier
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Short Note about Kinetics-600"
50 / 114 papers shown
Title
VidConv: A modernized 2D ConvNet for Efficient Video Recognition
Chuong H. Nguyen
Su Huynh
Vinh Nguyen
Ngoc-Khanh Nguyen
ViT
27
3
0
08 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
103
93
0
04 Jul 2022
(Un)likelihood Training for Interpretable Embedding
Jiaxin Wu
Chong-Wah Ngo
W. Chan
Zhijian Hou
14
2
0
01 Jul 2022
One-stage Action Detection Transformer
Lijun Li
Lian Zhuo
Bangyin Zhang
ViT
32
0
0
21 Jun 2022
Diffusion Models for Video Prediction and Infilling
Tobias Hoppe
Arash Mehrjou
Stefan Bauer
Didrik Nielsen
Andrea Dittadi
DiffM
VGen
38
131
0
15 Jun 2022
Cascaded Video Generation for Videos In-the-Wild
Lluis Castrejon
Nicolas Ballas
Aaron Courville
VGen
37
0
0
01 Jun 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
256
570
0
29 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
28
6
0
12 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
85
1,262
0
04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
Alexandros Stergiou
Dima Damen
AI4TS
EgoV
EDL
17
7
0
28 Apr 2022
Video Diffusion Models
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
70
1,523
0
07 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
46
102
0
04 Apr 2022
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Yuting Yang
Licheng Jiao
Xuantong Liu
F. Liu
Shuyuan Yang
Zhixi Feng
Xu Tang
ViT
MedIm
27
28
0
24 Mar 2022
Transframer: Arbitrary Frame Prediction with Generative Models
C. Nash
João Carreira
Jacob Walker
Iain Barr
Andrew Jaegle
Mateusz Malinowski
Peter W. Battaglia
ViT
27
37
0
17 Mar 2022
End-to-End Semantic Video Transformer for Zero-Shot Action Recognition
Keval Doshi
Yasin Yılmaz
ViT
35
2
0
10 Mar 2022
HiP: Hierarchical Perceiver
João Carreira
Skanda Koppula
Daniel Zoran
Adrià Recasens
Catalin Ionescu
...
M. Botvinick
Oriol Vinyals
Karen Simonyan
Andrew Zisserman
Andrew Jaegle
VLM
31
14
0
22 Feb 2022
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
Sihyun Yu
Jihoon Tack
Sangwoo Mo
Hyunsu Kim
Junho Kim
Jung-Woo Ha
Jinwoo Shin
DiffM
VGen
40
199
0
21 Feb 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
162
360
0
24 Jan 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
48
198
0
20 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
24
103
0
16 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
47
238
0
12 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
48
207
0
07 Jan 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
100
655
0
16 Dec 2021
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
28
54
0
14 Dec 2021
RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR
Yuyin Zhou
Shih-Cheng Huang
Jason Alan Fries
Alaa Youssef
T. Amrhein
...
Imon Banerjee
D. Rubin
Lei Xing
N. Shah
M. Lungren
32
43
0
23 Nov 2021
Multi-Modal Pre-Training for Automated Speech Recognition
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
30
16
0
12 Oct 2021
Revisiting 3D ResNets for Video Recognition
Xianzhi Du
Yeqing Li
Huayu Chen
Rui Qian
Jing Li
Irwan Bello
56
17
0
03 Sep 2021
Weakly Supervised Attention Model for RV StrainClassification from volumetric CTPA Scans
Noa Cahan
E. Marom
S. Soffer
Y. Barash
Eli Konen
Eyal Klang
H. Greenspan
42
8
0
26 Jul 2021
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
49
166
0
21 Jun 2021
Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Video Grounding
Chaolei Tan
Zihang Lin
Jianfang Hu
Xiang Li
Weishi Zheng
22
9
0
20 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
40
11
0
18 Jun 2021
Rethinking Transfer Learning for Medical Image Classification
Le Peng
Hengyue Liang
Gaoxiang Luo
Taihui Li
Ju Sun
VLM
LM&MA
16
5
0
09 Jun 2021
Hierarchical Video Generation for Complex Data
Lluis Castrejon
Nicolas Ballas
Aaron Courville
VGen
22
4
0
04 Jun 2021
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos
Lukas Hedegaard
Alexandros Iosifidis
3DPC
23
14
0
31 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
39
257
0
29 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Bo Xiong
Haoqi Fan
Kristen Grauman
Christoph Feichtenhofer
SSL
22
49
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
33
127
0
30 Mar 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
30
2,093
0
29 Mar 2021
Skeleton Aware Multi-modal Sign Language Recognition
Songyao Jiang
Bin Sun
Lichen Wang
Yue Bai
Kunpeng Li
Y. Fu
SLR
33
166
0
16 Mar 2021
Predicting Video with VQVAE
Jacob Walker
Ali Razavi
Aaron van den Oord
DRL
24
67
0
02 Mar 2021
Predicting post-operative right ventricular failure using video-based deep learning
R. Shad
Nicolas Quach
R. Fong
P. Kasinpila
C. Bowles
...
Y. Woo
J. Teuteberg
John P. Cunningham
C. Langlotz
W. Hiesinger
22
40
0
28 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,989
0
09 Feb 2021
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLM
AI4TS
38
185
0
11 Dec 2020
A Short Note on the Kinetics-700-2020 Human Action Dataset
Lucas Smaira
João Carreira
Eric Noland
Ellen Clancy
Amy Wu
Andrew Zisserman
10
137
0
21 Oct 2020
Learning to Compress Videos without Computing Motion
Meixu Chen
T. Goodall
Anjul Patney
A. Bovik
12
9
0
29 Sep 2020
Spatiotemporal Contrastive Video Representation Learning
Rui Qian
Tianjian Meng
Boqing Gong
Ming-Hsuan Yang
Haoran Wang
Serge J. Belongie
Huayu Chen
SSL
AI4TS
41
492
0
09 Aug 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
40
48
0
29 Jul 2020
Representation Learning with Video Deep InfoMax
R. Devon Hjelm
Philip Bachman
SSL
MDE
26
28
0
27 Jul 2020
Previous
1
2
3
Next