Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.07750
Cited By
v1
v2
v3 (latest)
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
22 May 2017
João Carreira
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"
50 / 3,645 papers shown
Title
Deep Understanding of Sign Language for Sign to Subtitle Alignment
Youngjoon Jang
Jeongsoo Choi
Junseok Ahn
Joon Son Chung
SLR
152
0
0
05 Mar 2025
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance
Zhao Yang
Zezhong Qian
Xiaofan Li
Weixiang Xu
Gongpeng Zhao
Ruohong Yu
Lingsi Zhu
Longjun Liu
DiffM
VGen
120
2
0
05 Mar 2025
Rebalanced Multimodal Learning with Data-aware Unimodal Sampling
Qingyuan Jiang
Zhouyang Chi
Xiao Ma
Qirong Mao
Yang Yang
Jinhui Tang
99
0
0
05 Mar 2025
Video-DPRP: A Differentially Private Approach for Visual Privacy-Preserving Video Human Activity Recognition
Allassan Tchangmena A Nken
Susan Mckeever
Peter Corcoran
Ihsan Ullah
PICV
101
0
0
03 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
Z. Zhong
Yuli Wang
Lulu Bi
Zhuoqi Ma
S. H. Ahn
...
Webster Stayman
Todd M. Kolb
I. Kamel
Harrison X. Bai
Zhicheng Jiao
LM&MA
93
0
0
03 Mar 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Baoqi Pei
Yuanmin Huang
Jilan Xu
Guo Chen
Yuping He
...
Yali Wang
Weidi Xie
Yu Qiao
Leilei Gan
Limin Wang
96
2
0
02 Mar 2025
CFSum: A Transformer-Based Multi-Modal Video Summarization Framework With Coarse-Fine Fusion
Yaowei Guo
Jiazheng Xing
Xiaojun Hou
Shuo Xin
Juntao Jiang
Demetri Terzopoulos
Chenfanfu Jiang
Yong Liu
ViT
73
0
0
01 Mar 2025
Learning to Animate Images from A Few Videos to Portray Delicate Human Actions
Haoxin Li
Yingchen Yu
Qilong Wu
Hanwang Zhang
Boyang Li
Song Bai
3DH
VGen
499
0
0
01 Mar 2025
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Xiao Wang
Jingyun Hua
Weihong Lin
Yize Zhang
Fuzheng Zhang
Jianlong Wu
Di Zhang
Liqiang Nie
VLM
146
0
0
28 Feb 2025
Unified Video Action Model
Shuang Li
Yihuai Gao
Dorsa Sadigh
Shuran Song
VGen
158
8
0
28 Feb 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu
Chen Zhao
Fatimah Zohra
Mattia Soldan
Alejandro Pardo
...
Juan Carlos León Alcázar
A. Cioppa
Silvio Giancola
Carlos Hinojosa
Bernard Ghanem
110
3
0
27 Feb 2025
Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion
Qingyuan Jiang
Longfei Huang
Yang Yang
94
0
0
27 Feb 2025
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
Pierre Vuillecard
J. Odobez
96
0
0
27 Feb 2025
Data Augmentation for Instruction Following Policies via Trajectory Segmentation
Niklas Höpner
Ilaria Tiddi
H. V. Hoof
82
0
0
25 Feb 2025
ASurvey: Spatiotemporal Consistency in Video Generation
Zhiyu Yin
Kehai Chen
Xuefeng Bai
Ruili Jiang
Junlin Li
Hongdong Li
Jin Liu
Yang Xiang
Jun Yu
Min Zhang
EGVM
VGen
AI4TS
94
0
0
25 Feb 2025
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding
Luoying Hao
Yan Hu
Yang Yue
Li Wu
Huazhu Fu
Jinming Duan
Jiang Liu
99
0
0
24 Feb 2025
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Florent Bartoccioni
Elias Ramzi
Victor Besnier
Shashanka Venkataramanan
Tuan-Hung Vu
...
Mickael Chen
Éloi Zablocki
Andrei Bursuc
Eduardo Valle
Matthieu Cord
VGen
179
2
0
24 Feb 2025
Time Series Domain Adaptation via Latent Invariant Causal Mechanism
Ruichu Cai
Junxian Huang
Zhenhui Yang
Zijian Li
Emadeldeen Eldele
Min Wu
Gang Hua
OOD
CML
BDL
AI4TS
112
0
0
23 Feb 2025
Robust Dynamic Facial Expression Recognition
Feng Liu
Hanyang Wang
Siyuan Shen
70
1
0
22 Feb 2025
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning
Caihua Liu
Xu Li
Wenjing Xue
Wei Tang
Xia Feng
80
0
0
20 Feb 2025
Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
Yizhuo Lu
Changde Du
Chong Wang
Xuanliu Zhu
Liuyun Jiang
Xujin Li
Huiguang He
VGen
227
4
0
20 Feb 2025
Benchmarking Zero-Shot Facial Emotion Annotation with Large Language Models: A Multi-Class and Multi-Frame Approach in DailyLife
He Zhang
Xinyi Fu
CVBM
87
2
0
18 Feb 2025
Improving action segmentation via explicit similarity measurement
Kamel Aouaidjia
Wenhao Zhang
Aofan Li
Chongsheng Zhang
81
0
0
15 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
89
0
0
11 Feb 2025
A Survey on Mamba Architecture for Vision Applications
Fady Ibrahim
Guangjun Liu
Guanghui Wang
Mamba
166
3
0
11 Feb 2025
History-Guided Video Diffusion
Kiwhan Song
Boyuan Chen
Max Simchowitz
Yilun Du
Russ Tedrake
Vincent Sitzmann
VGen
212
18
0
10 Feb 2025
Pre-Trained Video Generative Models as World Simulators
Haoran He
Yang Zhang
Liang Lin
Zhihao Xu
Ling Pan
VGen
164
5
0
10 Feb 2025
Conformal Predictions for Human Action Recognition with Vision-Language Models
Bary Tim
Fuchs Clément
Macq Benoît
VLM
140
0
0
10 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
Dawei Yin
111
1
0
09 Feb 2025
MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling
Sharana Dharshikgan Suresh Dass
H. Barua
Ganesh Krishnasamy
Raveendran Paramesran
Raphael C.-W. Phan
130
0
0
06 Feb 2025
AI-Based Thermal Video Analysis in Privacy-Preserving Healthcare: A Case Study on Detecting Time of Birth
Jorge García-Torres
Øyvind Meinich-Bache
Siren Rettedal
K. Engan
77
2
0
05 Feb 2025
BRIDLE: Generalized Self-supervised Learning with Quantization
Hoang M. Nguyen
Satya Narayan Shukla
Qiang Zhang
Hanchao Yu
Sreya D. Roy
Taipeng Tian
Lingjiong Zhu
Yuchen Liu
SSL
MQ
138
0
0
04 Feb 2025
BILLNET: A Binarized Conv3D-LSTM Network with Logic-gated residual architecture for hardware-efficient video inference
Van Thien Nguyen
William Guicquero
Gilles Sicard
3DV
MQ
145
2
0
24 Jan 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
122
1
0
22 Jan 2025
Efficient Lung Ultrasound Severity Scoring Using Dedicated Feature Extractor
Jiaqi Guo
Yunnan Wu
E. Kaimakamis
Georgios Petmezas
Vasileios E. Papageorgiou
N. Maglaveras
Aggelos K. Katsaggelos
182
0
0
21 Jan 2025
CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation
Zheng Chong
Wenqing Zhang
Shiyue Zhang
Jun Zheng
Xiao Dong
Haoxiang Li
Yiling Wu
D. Jiang
Xiaodan Liang
DiffM
80
2
0
20 Jan 2025
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
505
0
0
20 Jan 2025
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics
Tze Ho Elden Tse
Runyang Feng
Linfang Zheng
Jiho Park
Yixing Gao
Jihie Kim
A. Leonardis
H. Chang
130
0
0
13 Jan 2025
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
Naval Kishore Mehta
Arvind
Himanshu Kumar
Abeer Banerjee
Sumeet Saurav
Sanjay Singh
76
0
0
10 Jan 2025
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Mamba
153
3
0
10 Jan 2025
ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking
Tingyang Zhang
Chen Wang
Zhiyang Dou
Qingzhe Gao
Jiahui Lei
Baoquan Chen
Lingjie Liu
3DV
119
0
0
06 Jan 2025
Evolving Skeletons: Motion Dynamics in Action Recognition
Jushang Qiu
Lei Wang
162
0
0
05 Jan 2025
Multiscaled Multi-Head Attention-based Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
SLR
113
16
0
03 Jan 2025
SSL Framework for Causal Inconsistency between Structures and Representations
Hang Chen
Xinyu Yang
Keqing Du
Wenya Wang
124
2
0
03 Jan 2025
Beyond Words: AuralLLM and SignMST-C for Sign Language Production and Bidirectional Accessibility
Yulong Li
Yuxuan Zhang
Feilong Tang
Mingyuan Zhou
Zhixiang Lu
...
Jionglong Su
Chong Li
Yifang Wang
Imran Razzak
Jionglong Su
SLR
72
0
0
01 Jan 2025
DFME: A New Benchmark for Dynamic Facial Micro-expression Recognition
Sirui Zhao
Huaying Tang
Xinglong Mao
Shifeng Liu
Hanqing Tao
Hongya Wang
Tong Xu
Enhong Chen
102
3
0
31 Dec 2024
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
134
26
0
31 Dec 2024
Action-Agnostic Point-Level Supervision for Temporal Action Detection
Shuhei M. Yoshida
Takashi Shibata
M. Terao
Takayuki Okatani
Masashi Sugiyama
93
0
0
31 Dec 2024
Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization
Yuanpeng He
Lijian Li
Tianxiang Zhan
Wenpin Jiao
Chi-Man Pun
EDL
121
3
0
27 Dec 2024
TravelAgent: Generative Agents in the Built Environment
Ariel Noyman
Kai Hu
Kent Larson
AI4CE
52
2
0
25 Dec 2024
Previous
1
2
3
4
5
...
71
72
73
Next