Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 1,508 papers shown
Title
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
Mahdi M. Kalayeh
Shervin Ardeshir
Lingyi Liu
Nagendra Kamath
Ashok Chandrashekar
SSL
35
3
0
29 Apr 2022
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
40
2
0
28 Apr 2022
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
Alexandros Stergiou
Dima Damen
AI4TS
EgoV
EDL
26
7
0
28 Apr 2022
Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training
Guanhong Wang
Ke Lu
Yang Zhou
Zhanhao He
Gaoang Wang
SSL
32
3
0
27 Apr 2022
Contrastive Language-Action Pre-training for Temporal Localization
Mengmeng Xu
Erhan Gundogdu
⋆⋆ Maksim
Guohao Li
M. Donoser
Loris Bazzani
38
27
0
26 Apr 2022
ClothFormer:Taming Video Virtual Try-on in All Module
Jianbin Jiang
Tan Wang
He Yan
Junhui Liu
40
25
0
26 Apr 2022
Temporal Relevance Analysis for Video Action Models
Quanfu Fan
Donghyun Kim
Chun-Fu Chen
Chen
Stan Sclaroff
Kate Saenko
Sarah Adel Bargal
FAtt
33
0
0
25 Apr 2022
iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition
Yixuan Wei
Yue Cao
Zheng Zhang
Zhuliang Yao
Zhenda Xie
Han Hu
B. Guo
VLM
29
11
0
22 Apr 2022
Video Moment Retrieval from Text Queries via Single Frame Annotation
Ran Cui
Tianwen Qian
Pai Peng
E. Daskalaki
Jingjing Chen
Xiao-Wei Guo
Huyang Sun
Yu-Gang Jiang
22
35
0
20 Apr 2022
Attention in Attention: Modeling Context Correlation for Efficient Video Classification
Y. Hao
Shuo Wang
P. Cao
Xinjian Gao
Tong Xu
Jinmeng Wu
Xiangnan He
39
41
0
20 Apr 2022
Sound-Guided Semantic Video Generation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Chanyoung Kim
Wonjae Ryoo
Sang Ho Yoon
Hyunjun Cho
Jihyun Bae
Jinkyu Kim
Sangpil Kim
VGen
38
26
0
20 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
35
0
0
17 Apr 2022
3D Convolutional Networks for Action Recognition: Application to Sport Gesture Recognition
Pierre-Etienne Martin
J. Benois-Pineau
Renaud Péteri
A. Zemmari
J. Morlier
32
5
0
13 Apr 2022
Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization
Zhixi Cai
Kalin Stefanov
Abhinav Dhall
Munawar Hayat
27
3
0
13 Apr 2022
Calibrating Class Weights with Multi-Modal Information for Partial Video Domain Adaptation
Xiyu Wang
Yuecong Xu
K. Mao
Jianfei Yang
26
8
0
13 Apr 2022
Position-aware Location Regression Network for Temporal Video Grounding
Sunoh Kim
Kimin Yun
J. Choi
27
4
0
12 Apr 2022
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition
C. Nwoye
Deepak Alapatt
Tong Yu
Armine Vardazaryan
Fangfang Xia
...
Didier Mutter
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
N. Padoy
25
50
0
10 Apr 2022
Self-Supervised Video Representation Learning with Motion-Contrastive Perception
Jin-Yuan Liu
Ying Cheng
Yuejie Zhang
Ruiwei Zhao
Rui Feng
SSL
26
1
0
10 Apr 2022
Multimodal Transformer for Nursing Activity Recognition
Momal Ijaz
Renato Diaz
Chong Chen
ViT
35
26
0
09 Apr 2022
Probabilistic Representations for Video Contrastive Learning
Jungin Park
Jiyoung Lee
Ig-Jae Kim
Kwanghoon Sohn
SSL
40
44
0
08 Apr 2022
Frequency Selective Augmentation for Video Representation Learning
Jinhyung Kim
Taeoh Kim
Minho Shim
Dongyoon Han
Dongyoon Wee
Junmo Kim
AI4TS
54
3
0
08 Apr 2022
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Jinglin Xu
Yongming Rao
Xumin Yu
Guangyi Chen
Jie Zhou
Jiwen Lu
30
88
0
07 Apr 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Songwei Ge
Thomas Hayes
Harry Yang
Xiaoyue Yin
Guan Pang
David Jacobs
Jia-Bin Huang
Devi Parikh
ViT
62
215
0
07 Apr 2022
Video Diffusion Models
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
101
1,533
0
07 Apr 2022
Continual Inference: A Library for Efficient Online Inference with Deep Neural Networks in PyTorch
Lukas Hedegaard
Alexandros Iosifidis
BDL
3DV
CLL
17
6
0
07 Apr 2022
Detection of Distracted Driver using Convolution Neural Network
Narayana Darapaneni
Jai Arora
MoniShankar Hazra
Naman Vig
Simrandeep Singh Gandhi
Saurabh Gupta
A. Paduri
13
8
0
07 Apr 2022
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
46
24
0
06 Apr 2022
Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yi Tian Xu
Xiang Wang
Mingqian Tang
Changxin Gao
Rong Jin
Nong Sang
SSL
AI4TS
33
17
0
06 Apr 2022
Video Demoireing with Relation-Based Temporal Consistency
Peng Dai
Xin Yu
Lan Ma
Baoheng Zhang
Jia Li
Wenbo Li
Jiajun Shen
Xiaojuan Qi
34
25
0
06 Apr 2022
An Empirical Study of End-to-End Temporal Action Detection
Xiaolong Liu
S. Bai
Xiang Bai
27
58
0
06 Apr 2022
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zerui Li
Cheng Lu
Jia Qin
Chunle Guo
Mingg-Ming Cheng
60
149
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
29
21
0
06 Apr 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos
Shao-Wei Liu
Subarna Tripathi
Somdeb Majumdar
Xiaolong Wang
EgoV
45
93
0
04 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
56
102
0
04 Apr 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
37
92
0
04 Apr 2022
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Huazhang Hu
Sixun Dong
Yiqun Zhao
Dongze Lian
Zhengxin Li
Shenghua Gao
26
47
0
03 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
17
24
0
01 Apr 2022
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
Junyu Gao
Mengyuan Chen
Changsheng Xu
20
66
0
31 Mar 2022
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection
Congcong Li
Xinyao Wang
Longyin Wen
Dexiang Hong
Tiejian Luo
Libo Zhang
30
16
0
29 Mar 2022
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Bo He
Xitong Yang
Le Kang
Zhiyu Cheng
Xingfa Zhou
Abhinav Shrivastava
35
77
0
29 Mar 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
M. Volkovs
Animesh Garg
Guangwei Yu
44
153
0
28 Mar 2022
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
R Gnana Praveen
W. Melo
Nasib Ullah
Haseeb Aslam
Osama Zeeshan
...
M. Pedersoli
Alessandro Lameiras Koerich
Simon L Bacon
P. Cardinal
Eric Granger
30
68
0
28 Mar 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
38
205
0
28 Mar 2022
Discovering Human-Object Interaction Concepts via Self-Compositional Learning
Zhi Hou
Baosheng Yu
Dacheng Tao
27
18
0
27 Mar 2022
Class-Incremental Learning for Action Recognition in Videos
Jaeyoo Park
Minsoo Kang
Bohyung Han
CLL
24
52
0
25 Mar 2022
Learning to Adapt to Unseen Abnormal Activities under Weak Supervision
Jaeyoo Park
Junha Kim
Bohyung Han
OffRL
23
5
0
25 Mar 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Fei Wu
Yi Yang
Yueting Zhuang
Xinze Wang
44
73
0
24 Mar 2022
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection
Hitesh Sapkota
Qi Yu
16
39
0
24 Mar 2022
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Ye Liu
Siyuan Li
Yang Wu
C. Chen
Ying Shan
Xiaohu Qie
ViT
29
141
0
23 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
170
1,134
0
23 Mar 2022
Previous
1
2
3
...
12
13
14
...
29
30
31
Next