Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
v1
v2
v3 (latest)
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 3,645 papers shown
Title
Feature Visualization in 3D Convolutional Neural Networks
Chunpeng Li
Ya-tang Li
FAtt
44
0
0
12 May 2025
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining
Lu Dong
Han Zhang
Hongjie Zhang
Yuanmin Huang
Z. Ling
Yu Qiao
Limin Wang
Yun Wang
AI4TS
209
0
0
10 May 2025
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer
Ho-Joong Kim
Y. E. Lee
Jung-Ho Hong
Seong-Whan Lee
179
0
0
09 May 2025
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
Congqi Cao
Peiheng Han
Y. Zhang
Yating Yu
Qinyi Lv
Lingtong Min
Yanning Zhang
VLM
133
0
0
09 May 2025
TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for Documentaries
Jinze Lv
Jian Chen
Zi Long
Xianghua Fu
Yin Chen
VGen
132
0
0
09 May 2025
AI-Generated Fall Data: Assessing LLMs and Diffusion Model for Wearable Fall Detection
Sana Alamgeer
Yasine Souissi
Anne H. H. Ngu
74
0
0
07 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
133
0
0
06 May 2025
Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges
Hao Xu
Arbind Agrahari Baniya
Sam Well
Mohamed Reda Bouadjenek
Richard Dazeley
S. Aryal
AI4TS
56
0
0
06 May 2025
Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision
Linhan Cao
Wei Sun
Kaiwei Zhang
Yicong Peng
Guangtao Zhai
Xiongkuo Min
135
0
0
06 May 2025
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Tamim Ahmed
Thanassis Rikakis
66
0
0
03 May 2025
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
163
0
0
30 Apr 2025
Exploiting Inter-Sample Correlation and Intra-Sample Redundancy for Partially Relevant Video Retrieval
Junlong Ren
Gangjian Zhang
Yitao Hu
Jian Shu
Haoran Wang
102
0
0
28 Apr 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
164
1
0
28 Apr 2025
M2R2: MulitModal Robotic Representation for Temporal Action Segmentation
Daniel Sliwowski
Dongheui Lee
64
1
0
25 Apr 2025
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models
Min Wei
Chaohui Yu
Jingkai Zhou
Fan Wang
DiffM
VGen
74
2
0
24 Apr 2025
Latent Video Dataset Distillation
Ning Li
Antai Andy Liu
Jingran Zhang
Justin Cui
DD
VGen
146
0
0
23 Apr 2025
SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition
Hasan Algafri
H. Luqman
Sarah Alyami
Issam Laradji
49
0
0
23 Apr 2025
SignX: The Foundation Model for Sign Recognition
Sen Fang
Chunyu Sui
Hongwei Yi
C. Neidle
Dimitris N. Metaxas
SLR
78
0
0
22 Apr 2025
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Ziyi Liu
Yang Liu
72
1
0
21 Apr 2025
ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task
Ahmad Khalil
Mahmoud Khalil
A. Ngom
VLM
118
1
0
20 Apr 2025
Balancing Privacy and Action Performance: A Penalty-Driven Approach to Image Anonymization
Nazia Aslam
Kamal Nasrollahi
PICV
64
0
0
19 Apr 2025
PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition
Jongseo Lee
Wooil Lee
Gyeong-Moon Park
Seong Tae Kim
Jinwoo Choi
139
0
0
17 Apr 2025
Exploring Video-Based Driver Activity Recognition under Noisy Labels
Linjuan Fan
Di Wen
Kunyu Peng
Kailun Yang
J.N. Zhang
...
Yufan Chen
Junwei Zheng
Jiamin Wu
Xudong Han
Rainer Stiefelhagen
NoLa
96
0
0
16 Apr 2025
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Jinfeng Xu
Yuanmin Huang
Baoqi Pei
Junlin Hou
Qingqiu Li
Guo Chen
Yuhui Zhang
Rui Feng
Weidi Xie
DiffM
98
4
0
16 Apr 2025
Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition
Hongyu Qu
Ling Xing
Rui Yan
Yazhou Yao
G. Xie
Xiangbo Shu
75
0
0
14 Apr 2025
H-MoRe: Learning Human-centric Motion Representation for Action Analysis
Zhanbo Huang
Xiaoming Liu
Yu Kong
3DH
77
0
0
14 Apr 2025
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak N. Araabi
474
0
0
14 Apr 2025
Hands-On: Segmenting Individual Signs from Continuous Sequences
Low Jian He
Harry Walsh
Ozge Mercanoglu Sincan
Richard Bowden
SLR
73
0
0
11 Apr 2025
F
3
^3
3
Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
Zhaoyu Liu
Kan Jiang
Murong Ma
Zhe Hou
Yun Lin
Jin Song Dong
75
0
0
11 Apr 2025
End-to-End Facial Expression Detection in Long Videos
Yini Fang
Alec Diallo
Yiqi Shi
F. Jumelle
Bertram Shi
CVBM
54
0
0
10 Apr 2025
Extending Visual Dynamics for Video-to-Music Generation
Xiaohao Liu
Teng Tu
Yunshan Ma
Tat-Seng Chua
VGen
111
0
0
10 Apr 2025
Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition
Alexander Brettmann
Jakob Grävinghoff
Marlene Rüschoff
Marie Westhues
SLR
86
0
0
10 Apr 2025
Exploring Ordinal Bias in Action Recognition for Instructional Videos
Joochan Kim
Minjoon Jung
Byoung-Tak Zhang
65
0
0
09 Apr 2025
Pose-Aware Weakly-Supervised Action Segmentation
Seth Z. Zhao
Reza Ghoddoosian
Isht Dwivedi
Nakul Agarwal
Behzad Dariush
118
0
0
08 Apr 2025
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
Peng Wu
Wanshun Su
Guansong Pang
Yujia Sun
Qingsen Yan
Peng Wang
Yize Zhang
VLM
106
1
0
06 Apr 2025
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
Wenzhuo Liu
Wenshuo Wang
Yicheng Qiao
Qiannan Guo
Jiayin Zhu
...
Huiming Yang
Zhiwei Li
Lening Wang
Tiao Tan
Huaping Liu
97
1
0
03 Apr 2025
SocialGesture: Delving into Multi-person Gesture Understanding
Xu Cao
Pranav Virupaksha
Wenqi Jia
Bolin Lai
Fiona Ryan
Sangmin Lee
James M. Rehg
SLR
91
0
0
03 Apr 2025
A Sensorimotor Vision Transformer
Konrad Gadzicki
K. Schill
C. Zetzsche
142
0
0
03 Apr 2025
Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
Chuning Zhu
Raymond Yu
S. Feng
Benjamin Burchfiel
Paarth Shah
Abhishek Gupta
VGen
163
7
0
03 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
91
0
0
02 Apr 2025
Beyond Static Scenes: Camera-controllable Background Generation for Human Motion
Mingshuai Yao
Mengting Chen
Qinye Zhou
Yize Zhang
Ming-Yu Liu
...
Chen Ju
Shuai Xiao
Qingwen Liu
Jinsong Lan
Wangmeng Zuo
DiffM
VGen
118
1
0
01 Apr 2025
TenAd: A Tensor-based Low-rank Black Box Adversarial Attack for Video Classification
Kimia haghjooei
Mansoor Rezghi
91
0
0
01 Apr 2025
FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection
Xinnan Zhu
Yicheng Zhu
Tixin Chen
Wentao Wu
Yuanjie Dang
116
0
0
01 Apr 2025
Sample-level Adaptive Knowledge Distillation for Action Recognition
Ping Li
Chenhao Ping
Wenxiao Wang
Mingli Song
138
0
0
01 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kai Zhang
MGen
VGen
295
1
0
01 Apr 2025
The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning
Mingkai Tian
Guorong Li
Yuankai Qi
Amin Beheshti
Javen Qinfeng Shi
Anton van den Hengel
Qingming Huang
VGen
67
0
0
31 Mar 2025
FineCausal: A Causal-Based Framework for Interpretable Fine-Grained Action Quality Assessment
Ruisheng Han
Kanglei Zhou
Amir Atapour-Abarghouei
Xiaohui Liang
Hubert P. H. Shum
CML
158
0
0
31 Mar 2025
Towards Precise Action Spotting: Addressing Temporal Misalignment in Labels with Dynamic Label Assignment
Masato Tamura
84
0
0
31 Mar 2025
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Thinesh Thiyakesan Ponbagavathi
Alina Roitberg
62
0
0
31 Mar 2025
OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition
Shihao Cheng
Jinlu Zhang
Yue Liu
Zhigang Tu
VLM
65
0
0
30 Mar 2025
Previous
1
2
3
4
5
...
71
72
73
Next