ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.06950
  4. Cited By
The Kinetics Human Action Video Dataset

The Kinetics Human Action Video Dataset

19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
ArXivPDFHTML

Papers citing "The Kinetics Human Action Video Dataset"

50 / 2,015 papers shown
Title
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li
Renping Zhou
Jiawei Zhou
Yingwei Song
Johannes Herter
Minghan Qin
Gao Huang
Hanspeter Pfister
3DGS
VLM
66
0
0
13 Mar 2025
STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications
Andrew Gao
Jun Liu
AI4TS
58
0
0
11 Mar 2025
HERO: Human Reaction Generation from Videos
Chengjun Yu
Wei-dong Zhai
Yuhang Yang
Yang Cao
Zheng-jun Zha
VGen
56
0
0
11 Mar 2025
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Da Liu-Zhang
Lin Sui
Shuming Liu
Fangzhou Mu
Ziyi Wang
Bernard Ghanem
52
1
0
09 Mar 2025
End-to-End Action Segmentation Transformer
Tieqiao Wang
Sinisa Todorovic
ViT
39
0
0
08 Mar 2025
Secure On-Device Video OOD Detection Without Backpropagation
Secure On-Device Video OOD Detection Without Backpropagation
Li Li
Peilin Cai
Yuxiao Zhou
Zhiyu Ni
Renjie Liang
You Qin
Yi Nian
Zhuowen Tu
Xiyang Hu
Yue Zhao
OODD
FedML
68
2
0
08 Mar 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
42
0
0
04 Mar 2025
Attention Bootstrapping for Multi-Modal Test-Time Adaptation
Yusheng Zhao
Junyu Luo
Xiao Luo
Jinsheng Huang
Jingyang Yuan
Zhiping Xiao
M. Zhang
TTA
92
0
0
04 Mar 2025
Exploring Simple Siamese Network for High-Resolution Video Quality Assessment
Guotao Shen
Ziheng Yan
Xin Jin
Longhai Wu
Jie Chen
Ilhyun Cho
Cheul-hee Hahm
42
0
0
04 Mar 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou
Ke Mei
Yu Lu
Tianyi Wang
Fengyun Rao
94
2
0
03 Mar 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Baoqi Pei
Yuanmin Huang
Jilan Xu
Guo Chen
Yuping He
...
Yali Wang
Weidi Xie
Yu Qiao
Fei Wu
Limin Wang
41
0
0
02 Mar 2025
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
Otto Brookes
Maksim Kukushkin
Majid Mirmehdi
Colleen Stephens
Paula Dieguez
...
Lukas Boesch
Thomas Schmid
M. Arandjelovic
H. Kühl
T. Burghardt
48
0
0
28 Feb 2025
Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints
Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints
Masoumeh Chapariniya
Hossein Ranjbar
Teodora Vukovic
Sarah Ebling
Volker Dellwo
3DPC
51
0
0
28 Feb 2025
AgroLLM: Connecting Farmers and Agricultural Practices through Large Language Models for Enhanced Knowledge Transfer and Practical Application
Dinesh Jackson Samuel
Inna Skarga-Bandurova
David Sikolia
Muhammad Awais
55
0
0
28 Feb 2025
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
Changyeon Kim
Minho Heo
Doohyun Lee
Jinwoo Shin
Honglak Lee
Joseph J. Lim
Kimin Lee
44
1
0
28 Feb 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu
Chen Zhao
Fatimah Zohra
Mattia Soldan
Alejandro Pardo
...
Juan Carlos León Alcázar
A. Cioppa
Silvio Giancola
Carlos Hinojosa
Bernard Ghanem
68
3
0
27 Feb 2025
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu
Congqi Cao
Yifan Zhang
Yanning Zhang
VLM
43
0
0
27 Feb 2025
Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition
Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition
Hongda Liu
Yunlong Wang
Min Ren
Junxing Hu
Zhengquan Luo
Guangqi Hou
Zhe Sun
55
0
0
24 Feb 2025
Fine-Grained Video Captioning through Scene Graph Consolidation
Fine-Grained Video Captioning through Scene Graph Consolidation
Sanghyeok Chu
Seonguk Seo
Bohyung Han
57
1
0
23 Feb 2025
Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Mengqi He
Jing Zhang
VLM
48
0
0
19 Feb 2025
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Sihyun Yu
Meera Hahn
Dan Kondratyuk
Jinwoo Shin
Agrim Gupta
José Lezama
Irfan Essa
David A. Ross
Jonathan Huang
DiffM
VGen
77
0
0
18 Feb 2025
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
Junhyeok Kim
Min Soo Kim
Jiwan Chung
Jungbin Cho
Jisoo Kim
Sungwoong Kim
Gyeongbo Sim
Youngjae Yu
EgoV
60
0
0
17 Feb 2025
Improving action segmentation via explicit similarity measurement
Improving action segmentation via explicit similarity measurement
Kamel Aouaidjia
Wenhao Zhang
Aofan Li
Chongsheng Zhang
44
0
0
15 Feb 2025
Janus: Collaborative Vision Transformer Under Dynamic Network Environment
Janus: Collaborative Vision Transformer Under Dynamic Network Environment
Linyi Jiang
Silvery Fu
Yifei Zhu
Bo Li
ViT
203
0
0
14 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
119
0
0
12 Feb 2025
A Survey on Mamba Architecture for Vision Applications
A Survey on Mamba Architecture for Vision Applications
Fady Ibrahim
Guangjun Liu
Guanghui Wang
Mamba
62
2
0
11 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
45
0
0
11 Feb 2025
History-Guided Video Diffusion
Kiwhan Song
Boyuan Chen
Max Simchowitz
Yilun Du
Russ Tedrake
Vincent Sitzmann
VGen
117
7
0
10 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
128
2
0
07 Feb 2025
BRIDLE: Generalized Self-supervised Learning with Quantization
BRIDLE: Generalized Self-supervised Learning with Quantization
Hoang M. Nguyen
Satya Narayan Shukla
Qiang Zhang
Hanchao Yu
Sreya D. Roy
Taipeng Tian
Lingjiong Zhu
Yuchen Liu
SSL
MQ
84
0
0
04 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao Song
Chiwun Yang
VGen
46
2
0
01 Feb 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Can masking background and object reduce static bias for zero-shot action recognition?
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
43
0
0
22 Jan 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
91
22
0
21 Jan 2025
Human Activity Recognition in an Open World
Human Activity Recognition in an Open World
D. Prijatelj
Samuel Grieggs
Jin Huang
Dawei Du
Ameya Shringi
Christopher Funk
Adam Kaufman
Eric Robertson
Walter J. Scheirer University of Notre Dame
72
3
0
17 Jan 2025
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
Naval Kishore Mehta
Arvind
Himanshu Kumar
Abeer Banerjee
Sumeet Saurav
Sanjay Singh
44
0
0
10 Jan 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
64
8
0
10 Jan 2025
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Mamba
55
3
0
10 Jan 2025
MLVU: Benchmarking Multi-task Long Video Understanding
MLVU: Benchmarking Multi-task Long Video Understanding
Yueze Wang
Yan Shu
Bo Zhao
Boya Wu
Junjie Zhou
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
58
11
0
03 Jan 2025
Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection
Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection
Ayush Ghadiya
P. Kar
Vishal M. Chudasama
Pankaj Wasnik
48
1
0
31 Dec 2024
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
79
25
0
31 Dec 2024
Finger in Camera Speaks Everything: Unconstrained Air-Writing for
  Real-World
Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
Meiqi Wu
Kaiqi Huang
Yuanqiang Cai
Shiyu Hu
Yuzhong Zhao
Weiqiang Wang
VGen
40
1
0
27 Dec 2024
Sensitive Image Classification by Vision Transformers
Sensitive Image Classification by Vision Transformers
Hanxian He
Campbell Wilson
Thanh Thi Nguyen
Janis Dalins
ViT
89
0
0
21 Dec 2024
LEARN: A Unified Framework for Multi-Task Domain Adapt Few-Shot Learning
LEARN: A Unified Framework for Multi-Task Domain Adapt Few-Shot Learning
Bharadwaj Ravichandran
Alexander Lynch
S. Brockman
Brandon RichardWebster
Dawei Du
A. Hoogs
Christopher Funk
ObjD
VLM
73
0
0
20 Dec 2024
Query-centric Audio-Visual Cognition Network for Moment Retrieval,
  Segmentation and Step-Captioning
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Qingming Huang
80
0
0
18 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
79
0
0
18 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?
Xi Ding
Lei Wang
184
0
0
18 Dec 2024
Gramian Multimodal Representation Learning and Alignment
Gramian Multimodal Representation Learning and Alignment
Giordano Cicchetti
Eleonora Grassucci
Luigi Sigillo
Danilo Comminiello
94
1
0
16 Dec 2024
Training Strategies for Isolated Sign Language Recognition
Training Strategies for Isolated Sign Language Recognition
Karina Kvanchiani
Roman Kraynov
Elizaveta Petrova
Petr Surovcev
Aleksandr Nagaev
A. Kapitanov
84
1
0
16 Dec 2024
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
Yulin Wang
Haoji Zhang
Yang Yue
Shiji Song
Chao Deng
Junlan Feng
Gao Huang
86
3
0
15 Dec 2024
Policy-shaped prediction: avoiding distractions in model-based
  reinforcement learning
Policy-shaped prediction: avoiding distractions in model-based reinforcement learning
Miles Hutson
Isaac Kauvar
Nick Haber
75
0
0
08 Dec 2024
Previous
12345...394041
Next