ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.06950
  4. Cited By
The Kinetics Human Action Video Dataset

The Kinetics Human Action Video Dataset

19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
ArXivPDFHTML

Papers citing "The Kinetics Human Action Video Dataset"

50 / 2,015 papers shown
Title
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
Zihua Wang
Ruibo Li
Haozhe Du
Joey Tianyi Zhou
Yu Zhang
Xu Yang
MLLM
17
0
0
19 May 2025
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation
Teli Ma
Jia Zheng
Zifan Wang
Ziyao Gao
Jiaming Zhou
Junwei Liang
2
0
0
17 May 2025
A Fourier Space Perspective on Diffusion Models
A Fourier Space Perspective on Diffusion Models
Fabian Falck
Teodora Pandeva
Kiarash Zahirnia
Rachel Lawrence
Richard Turner
Edward Meeds
Javier Zazo
Sushrut Karmalkar
DiffM
MedIm
14
0
0
16 May 2025
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
Keunwoo Peter Yu
Joyce Chai
MLLM
VLM
12
0
0
16 May 2025
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
Xiang He
Dongcheng Zhao
Yang Li
Qingqun Kong
Xin Yang
Yi Zeng
26
0
0
15 May 2025
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and O(T)\mathcal{O}(T)O(T) Complexity
Shihao Zou
Qingfeng Li
Wei Ji
Jingjing Li
Yongkui Yang
Guoqi Li
Chao Dong
29
0
0
15 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
26
0
0
14 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
31
0
0
13 May 2025
Video Dataset Condensation with Diffusion Models
Video Dataset Condensation with Diffusion Models
Zhe Li
Hadrien Reynaud
Mischa Dombrowski
Sarah Cechnicka
Franciskus Xaverius Erick
Bernhard Kainz
DD
VGen
52
0
0
10 May 2025
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Tamim Ahmed
Thanassis Rikakis
34
0
0
03 May 2025
Vehicular Communication Security: Multi-Channel and Multi-Factor Authentication
Vehicular Communication Security: Multi-Channel and Multi-Factor Authentication
Marco De Vincenzi
Shri Kiran Srinivasan
Chen Bo Calvin Zhang
Manuel Garcia
Shaozu Ding
Chiara Bodei
Ilaria Matteucci
Dajiang Suo
Dajiang Suo
53
0
0
01 May 2025
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
Zhifu Zhao
Hanyang Hua
Jiajian Li
Shaoxin Wu
Fu Li
Yangtao Zhou
Yang Li
DiffM
68
0
0
30 Apr 2025
MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment
MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment
Yachun Mi
Yu Li
Weicheng Meng
Chong Chen
Chen Hui
Shaohui Liu
36
0
0
22 Apr 2025
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Meng Cui
Xianghu Yue
Xinyuan Qian
Jinzheng Zhao
Haohe Liu
Xubo Liu
Daoliang Li
Wenwu Wang
34
0
0
21 Apr 2025
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Ziyi Liu
Yong-Jin Liu
26
0
0
21 Apr 2025
PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition
PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition
Jongseo Lee
Wooil Lee
Gyeong-Moon Park
Seong Tae Kim
Jinwoo Choi
35
0
0
17 Apr 2025
SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature Aggregation
SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature Aggregation
Zongye Zhang
Wenrui Cai
Qingjie Liu
Yanjie Wang
34
0
0
16 Apr 2025
Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation
Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation
Amirhossein Dadashzadeh
Parsa Esmati
Majid Mirmehdi
TTA
VLM
57
0
0
15 Apr 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Haoran Hao
Jiaming Han
Yiyuan Zhang
Xiangyu Yue
36
0
0
14 Apr 2025
F$^3$Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
F3^33Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
Zhaoyu Liu
Kan Jiang
Murong Ma
Zhé Hóu
Yun Lin
Jin Song Dong
37
0
0
11 Apr 2025
Exploring Ordinal Bias in Action Recognition for Instructional Videos
Exploring Ordinal Bias in Action Recognition for Instructional Videos
Joochan Kim
Minjoon Jung
Byoung-Tak Zhang
30
0
0
09 Apr 2025
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
E. Peruzzo
Dejia Xu
Xingqian Xu
Humphrey Shi
N. Sebe
DiffM
VGen
59
0
0
09 Apr 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViT
SSL
52
0
0
08 Apr 2025
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
Wang Tang
Fethiye Irmak Dogan
Linbo Qing
Hatice Gunes
37
0
0
07 Apr 2025
Video-Bench: Human-Aligned Video Generation Benchmark
Video-Bench: Human-Aligned Video Generation Benchmark
Hui Han
Siyuan Li
Jiaqi Chen
Yiwen Yuan
Yuling Wu
...
Yongbin Li
Jun Zhang
Chi Zhang
Li Li
Yongxin Ni
EGVM
VGen
73
0
0
07 Apr 2025
Studying Image Diffusion Features for Zero-Shot Video Object Segmentation
Studying Image Diffusion Features for Zero-Shot Video Object Segmentation
Thanos Delatolas
Vicky S. Kalogeiton
Dim P. Papadopoulos
DiffM
VOS
50
1
0
07 Apr 2025
3D Scene Understanding Through Local Random Access Sequence Modeling
3D Scene Understanding Through Local Random Access Sequence Modeling
Wanhee Lee
Klemen Kotar
R. Venkatesh
Jared Watrous
Honglin Chen
Khai Loong Aw
Daniel L. K. Yamins
3DV
42
0
0
04 Apr 2025
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Yusheng Zhao
Junyu Luo
Zhiyuan Ning
Weizhi Zhang
Zhiping Xiao
Wei Ju
Philip S. Yu
Ming Zhang
AuLLM
49
0
0
03 Apr 2025
SocialGesture: Delving into Multi-person Gesture Understanding
SocialGesture: Delving into Multi-person Gesture Understanding
Xu Cao
Pranav Virupaksha
Wenqi Jia
Bolin Lai
Fiona Ryan
Sangmin Lee
James M. Rehg
SLR
56
0
0
03 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
57
0
0
02 Apr 2025
Learning from Streaming Video with Orthogonal Gradients
Learning from Streaming Video with Orthogonal Gradients
Tengda Han
Dilara Gokay
Joseph Heyward
Chuhan Zhang
Daniel Zoran
Viorica Patraucean
João Carreira
Dima Damen
Andrew Zisserman
48
0
0
02 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
62
0
0
01 Apr 2025
Sample-level Adaptive Knowledge Distillation for Action Recognition
Sample-level Adaptive Knowledge Distillation for Action Recognition
Ping Li
Chenhao Ping
Wenxiao Wang
Mingli Song
49
0
0
01 Apr 2025
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning
Yubo Zhang
Pedro Botelho
Trevor Gordon
Gil Zussman
I. Kadota
55
0
0
31 Mar 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
56
0
0
30 Mar 2025
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Antonia Karamolegkou
Malvina Nikandrou
Georgios Pantazopoulos
Danae Sanchez Villegas
Phillip Rust
Ruchira Dhar
Daniel Hershcovich
Anders Søgaard
39
0
0
28 Mar 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Abdelrahman M. Shaker
Muhammad Maaz
Chenhui Gou
Hamid Rezatofighi
Salman Khan
Fahad Shahbaz Khan
177
0
0
27 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
53
0
0
24 Mar 2025
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang
Yake Wei
Zequn Yang
D. Hu
47
0
0
24 Mar 2025
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
Zihao Chen
Hsuanyu Wu
Chi-Hsi Kung
Yi-Ting Chen
Yan-Tsung Peng
47
0
0
24 Mar 2025
Temporal Action Detection Model Compression by Progressive Block Drop
Temporal Action Detection Model Compression by Progressive Block Drop
Xiaoyong Chen
Yong Guo
Jiaming Liang
Sitong Zhuang
Runhao Zeng
Xiping Hu
55
0
0
21 Mar 2025
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
Kyungho Bae
Jinhyung Kim
Sihaeng Lee
Soonyoung Lee
G. Lee
Jinwoo Choi
64
1
0
20 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
59
0
0
20 Mar 2025
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu
Lin Zhang
Pengtao Chen
Peng Ye
Xianfang Zeng
Wei Cheng
Gang Yu
Tao Chen
93
0
0
19 Mar 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
59
0
0
17 Mar 2025
Efficient Motion-Aware Video MLLM
Efficient Motion-Aware Video MLLM
Zijia Zhao
Yuqi Huo
Tongtian Yue
Longteng Guo
Haoyu Lu
Binghui Wang
Xin Wu
Jiaheng Liu
65
0
0
17 Mar 2025
Action tube generation by person query matching for spatio-temporal action detection
Action tube generation by person query matching for spatio-temporal action detection
Kazuki Omi
Jion Oshima
Toru Tamaki
65
0
0
17 Mar 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
56
0
0
16 Mar 2025
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Haonan Wang
Qixiang Zhang
Lehan Wang
Xuanqi Huang
Xiaomeng Li
VOS
VGen
62
0
0
14 Mar 2025
KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception
Yunpeng Qu
Kun Yuan
Qizhi Xie
Ming-Ting Sun
Chao Zhou
Jian Wang
73
1
0
13 Mar 2025
1234...394041
Next