Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.06950
Cited By
The Kinetics Human Action Video Dataset
19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Kinetics Human Action Video Dataset"
50 / 2,015 papers shown
Title
Video Instruction Tuning With Synthetic Data
Yuanhan Zhang
Jinming Wu
Wei Li
Bo Li
Zejun Ma
Ziwei Liu
Chunyuan Li
SyDa
VGen
55
143
0
03 Oct 2024
Computer-aided Colorization State-of-the-science: A Survey
Yu Cao
Xin Duan
Xiangqiao Meng
P. Y. Mok
Ping Li
Tong-Yee Lee
30
0
0
03 Oct 2024
An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos
Arun V. Reddy
Ketul Shah
Corban Rivera
William Paul
Celso M. De Melo
Rama Chellappa
SLR
36
0
0
03 Oct 2024
COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
Mingzhen Sun
Weining Wang
Xinxin Zhu
Jing Liu
VGen
DiffM
31
0
0
02 Oct 2024
Tracking objects that change in appearance with phase synchrony
Sabine Muzellec
Drew Linsley
A. Ashok
E. Mingolla
Girik Malik
Rufin VanRullen
Thomas Serre
31
1
0
02 Oct 2024
Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion
Dexuan Ding
Lei Wang
Liyun Zhu
Tom Gedeon
Piotr Koniusz
42
4
0
02 Oct 2024
Delving Deep into Engagement Prediction of Short Videos
Dasong Li
Wenjie Li
Baili Lu
Hongsheng Li
Sizhuo Ma
Gurunandan Krishnan
Jian Wang
34
0
0
30 Sep 2024
REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke
Wiktor Mucha
Kentaro Tanaka
M. Kampel
42
0
0
30 Sep 2024
CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis
Nishq Poorav Desai
Ali Etemad
Michael A. Greenspan
40
0
0
30 Sep 2024
Fast Encoding and Decoding for Implicit Video Representation
Hao Chen
Saining Xie
Ser-Nam Lim
Abhinav Shrivastava
31
1
0
28 Sep 2024
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
Min Yang
Zichen Zhang
Limin Wang
AI4TS
39
0
0
27 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
39
0
0
26 Sep 2024
Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming
Zehao Zhu
Wei Sun
Jun Jia
Wei Wu
Sibin Deng
Kai Li
Ying-Cong Chen
Xiongkuo Min
Jia Wang
Guangtao Zhai
33
0
0
26 Sep 2024
EAGLE: Egocentric AGgregated Language-video Engine
Jing Bi
Yunlong Tang
Luchuan Song
A. Vosoughi
Nguyen Nguyen
Chenliang Xu
48
8
0
26 Sep 2024
Towards Synthetic Data Generation for Improved Pain Recognition in Videos under Patient Constraints
Jonas Nasimzada
Jens Kleesiek
Ken Herrmann
Alina Roitberg
C. Seibold
22
0
0
24 Sep 2024
Self-Supervised Any-Point Tracking by Contrastive Random Walks
Ayush Shrivastava
Andrew Owens
38
3
0
24 Sep 2024
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Yuxiao Chen
Keqin Li
Wentao Bao
Deep Patel
Yu Kong
Martin Renqiang Min
Dimitris N. Metaxas
DiffM
41
1
0
22 Sep 2024
Detecting Inpainted Video with Frequency Domain Insights
Quanhui Tang
Jingtao Cao
18
0
0
21 Sep 2024
Across-Game Engagement Modelling via Few-Shot Learning
Kosmas Pinitas
Konstantinos Makantasis
Georgios N. Yannakakis
31
1
0
19 Sep 2024
Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment
Ohad Cohen
Gershon Hazan
Sharon Gannot
29
0
0
14 Sep 2024
ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild
Arya Farkhondeh
Samy Tafasca
J. Odobez
27
0
0
14 Sep 2024
Data Collection-free Masked Video Modeling
Yuchi Ishikawa
Masayoshi Kondo
Yoshimitsu Aoki
ViT
19
1
0
10 Sep 2024
ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition
Shiting Xiao
Yuhang Li
Youngeun Kim
Donghyun Lee
Priyadarshini Panda
44
1
0
03 Sep 2024
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
Ishan Rajendrakumar Dave
Fabian Caba Heilbron
Mubarak Shah
Simon Jenni
46
1
0
02 Sep 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
45
1
0
30 Aug 2024
Text-Enhanced Zero-Shot Action Recognition: A training-free approach
Massimo Bosetti
Shibingfeng Zhang
Bendetta Liberatori
Giacomo Zara
Elisa Ricci
Paolo Rota
VLM
49
0
0
29 Aug 2024
Online pre-training with long-form videos
Itsuki Kato
Kodai Kamiya
Toru Tamaki
OnRL
45
0
0
28 Aug 2024
Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis
Sijie Mai
Yu Zhao
Ying Zeng
Jianhua Yao
Haifeng Hu
36
2
0
28 Aug 2024
Fine-grained length controllable video captioning with ordinal embeddings
Tomoya Nitta
Takumi Fukuzawa
Toru Tamaki
48
0
0
27 Aug 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models
Zejia Weng
Xitong Yang
Zhen Xing
Zuxuan Wu
Yu-Gang Jiang
VGen
DiffM
47
5
0
27 Aug 2024
SurGen: Text-Guided Diffusion Model for Surgical Video Generation
Joseph Cho
Samuel Schmidgall
C. Zakka
Mrudang Mathur
Dhamanpreet Kaur
R. Shad
W. Hiesinger
VGen
MedIm
31
6
0
26 Aug 2024
HabitAction: A Video Dataset for Human Habitual Behavior Recognition
Hongwu Li
Zhenliang Zhang
Wei Wang
30
0
0
24 Aug 2024
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Can Qin
Congying Xia
Krithika Ramakrishnan
Michael S Ryoo
Lifu Tu
...
Silvio Savarese
Juan Carlos Niebles
Zeyuan Chen
Ran Xu
Caiming Xiong
VGen
DiffM
76
2
0
22 Aug 2024
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
Yunlong Tang
Gen Zhan
Li Yang
Yiting Liao
Chenliang Xu
VGen
DiffM
LRM
53
8
0
21 Aug 2024
E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
Shangkun Sun
Xiaoyu Liang
S. Fan
Wenxu Gao
Wei-Nan Gao
DiffM
58
0
0
21 Aug 2024
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
Bin Wang
Wenqian Wang
VLM
37
1
0
20 Aug 2024
Flatten: Video Action Recognition is an Image Classification task
Junlin Chen
Chengcheng Xu
Yangfan Xu
Jian Yang
Jun Yu Li
Zhiping Shi
39
1
0
17 Aug 2024
Continuous Perception Benchmark
Zeyu Wang
Zhenzhen Weng
Serena Yeung-Levy
VLM
39
0
0
15 Aug 2024
Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
Shizhou Zhang
Wenlong Luo
De-Chun Cheng
Qingchun Yang
Lingyan Ran
Yinghui Xing
Yanning Zhang
VOS
39
3
0
14 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
29
1
0
13 Aug 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
72
6
0
13 Aug 2024
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
Mushui Liu
Bozheng Li
Yunlong Yu
VLM
28
10
0
12 Aug 2024
Efficient Test-Time Prompt Tuning for Vision-Language Models
Yuhan Zhu
Guozhen Zhang
Chen Xu
Haocheng Shen
Xiaoxin Chen
Gangshan Wu
Limin Wang
VLM
42
2
0
11 Aug 2024
Surgformer: Surgical Transformer with Hierarchical Temporal Attention for Surgical Phase Recognition
Shu Yang
Luyang Luo
Qiong Wang
Hao Chen
MedIm
41
9
0
07 Aug 2024
ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval
Ruixiang Zhao
Jian Jia
Yan Li
Xuehan Bai
Quan Chen
Han Li
Peng Jiang
Xirong Li
44
0
0
06 Aug 2024
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection
Taichi Nishimura
Shota Nakada
Hokuto Munakata
Tatsuya Komatsu
VLM
34
1
0
06 Aug 2024
GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
Manu S. Pillai
Mamshad Nayeem Rizve
M. Shah
51
2
0
05 Aug 2024
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng
Jun Wang
Chuanhao Li
Quanfeng Lu
Hao Tian
...
Jifeng Dai
Ping Luo
Ping Luo
Kaipeng Zhang
Wenqi Shao
VLM
60
18
0
05 Aug 2024
Cross-Modality Clustering-based Self-Labeling for Multimodal Data Classification
P. Zyblewski
Leandro L. Minku
27
0
0
05 Aug 2024
VidModEx: Interpretable and Efficient Black Box Model Extraction for High-Dimensional Spaces
Somnath Sendhil Kumar
Yuvaraj Govindarajulu
Pavan Kulkarni
Manojkumar Somabhai Parmar
FAtt
46
0
0
04 Aug 2024
Previous
1
2
3
4
5
...
39
40
41
Next