Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.06987
Cited By
A Short Note on the Kinetics-700 Human Action Dataset
15 July 2019
João Carreira
Eric Noland
Chloe Hillier
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Short Note on the Kinetics-700 Human Action Dataset"
50 / 117 papers shown
Title
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Yi-Xing Peng
Q. Yang
Yu-Ming Tang
Shenghao Fu
Kun-Yu Lin
Xihan Wei
Wei-Shi Zheng
45
0
0
25 Apr 2025
Post-processing for Fair Regression via Explainable SVD
Zhiqun Zuo
Ding Zhu
Mohammad Mahdi Khalili
223
0
0
04 Apr 2025
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Jiawei Wang
Yushen Zuo
Yuanjun Chai
Ziqiang Liu
Yichen Fu
Yichun Feng
Kin-Man Lam
AAML
VLM
47
0
0
02 Apr 2025
Action tube generation by person query matching for spatio-temporal action detection
Kazuki Omi
Jion Oshima
Toru Tamaki
65
0
0
17 Mar 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
122
9
0
18 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
45
0
0
11 Feb 2025
Human Activity Recognition in an Open World
D. Prijatelj
Samuel Grieggs
Jin Huang
Dawei Du
Ameya Shringi
Christopher Funk
Adam Kaufman
Eric Robertson
Walter J. Scheirer University of Notre Dame
72
3
0
17 Jan 2025
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
79
0
0
18 Dec 2024
Do Language Models Understand Time?
Xi Ding
Lei Wang
184
0
0
18 Dec 2024
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
221
1
0
30 Oct 2024
Investigating Memorization in Video Diffusion Models
Chong Chen
Enhuai Liu
Daochang Liu
M. Shah
Chang Xu
VGen
DiffM
86
1
0
29 Oct 2024
Egocentric and Exocentric Methods: A Short Survey
Anirudh Thatipelli
Shao-Yuan Lo
Amit K. Roy-Chowdhury
EgoV
42
2
0
27 Oct 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
72
6
0
13 Aug 2024
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
80
3
0
20 Jul 2024
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li
Pengyu Liu
Pengyu Liu
Guoliang Chen
Zhiliang Wu
Hehe Fan
Meng Wang
47
7
0
07 Jul 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
59
10
1
09 Jun 2024
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan
Hongbo Liu
Mading Li
Muyi Sun
Ming Sun
Jiachao Gong
Jinhua Hao
Chao Zhou
Yansong Tang
ViT
60
5
0
28 May 2024
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Mohammed Baharoon
Jonathan Klein
D. L. Michels
SSL
VLM
44
0
0
23 May 2024
GTAutoAct: An Automatic Datasets Generation Framework Based on Game Engine Redevelopment for Action Recognition
Xingyu Song
Zhan Li
Shi Chen
K. Demachi
35
1
0
24 Jan 2024
Video Recognition in Portrait Mode
Mingfei Han
Linjie Yang
Xiaojie Jin
Jiashi Feng
Xiaojun Chang
Heng Wang
30
3
0
21 Dec 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Guohao Li
38
25
0
28 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
28
63
0
30 Oct 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation
Hejun Xiao
Kunyu Peng
Xiangsheng Huang
Alina Roitberg
Hao Li
Zhao Wang
Rainer Stiefelhagen
26
3
0
23 Aug 2023
UnLoc: A Unified Framework for Video Localization Tasks
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
33
53
0
21 Aug 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
54
19
0
13 Jul 2023
Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes
Aran Nayebi
R. Rajalingham
M. Jazayeri
G. R. Yang
36
19
0
19 May 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
40
13
0
24 Apr 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
57
156
0
28 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
81
470
0
27 Mar 2023
HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
Shixiang Tang
Cheng Chen
Qingsong Xie
Meilin Chen
Yizhou Wang
...
Feng Zhu
Haiyang Yang
Li Yi
Rui Zhao
Wanli Ouyang
VLM
32
36
0
10 Mar 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
Chong Chen
Mu Li
ViT
58
144
0
06 Feb 2023
A Survey on Human Action Recognition
Zhou Shuchang
29
0
0
20 Dec 2022
AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier-Stokes Solutions
F. Bonnet
Jocelyn Ahmed Mazari
Paola Cinnella
Patrick Gallinari
AI4CE
33
55
0
15 Dec 2022
EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder
Xiaoshui Huang
Zhou Huang
Shengjia Li
Wentao Qu
Tong He
Yuenan Hou
Yifan Zuo
Wanli Ouyang
13
11
0
08 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
38
54
0
06 Dec 2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
57
311
0
06 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
39
0
0
03 Dec 2022
Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack
Yunfeng Diao
He Wang
Tianjia Shao
Yong-Liang Yang
Kun Zhou
David C. Hogg
Meng Wang
AAML
45
7
0
21 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
30
107
0
17 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
87
681
0
14 Nov 2022
Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions
Yong-Lu Li
Hongwei Fan
Zuoyu Qiu
Yiming Dou
Liang Xu
...
Peiyang Guo
Haisheng Su
Dongliang Wang
Wei Wu
Cewu Lu
35
7
0
14 Nov 2022
Large Scale Real-World Multi-Person Tracking
Bing Shuai
Alessandro Bergamo
Uta Buechler
Andrew G. Berneshawi
Alyssa Boden
Joseph Tighe
18
13
0
03 Nov 2022
MovieCLIP: Visual Scene Recognition in Movies
Digbalay Bose
Rajat Hebbar
Krishna Somandepalli
Haoyang Zhang
Huayu Chen
K. Cole-McLaughlin
Haoran Wang
Shrikanth Narayanan
CLIP
22
21
0
20 Oct 2022
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
41
7
0
19 Oct 2022
Temporal Action Segmentation: An Analysis of Modern Techniques
Guodong Ding
Fadime Sener
Angela Yao
47
75
0
19 Oct 2022
Transfer-learning for video classification: Video Swin Transformer on multiple domains
Daniel de Oliveira
David Martins de Matos
ViT
29
0
0
18 Oct 2022
Global Semantic Descriptors for Zero-Shot Action Recognition
Valter Estevam
Rayson Laroca
Hélio Pedrini
David Menotti
40
3
0
24 Sep 2022
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain
Francesco Ragusa
Antonino Furnari
G. Farinella
EgoV
48
24
0
19 Sep 2022
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Mian
ViT
28
44
0
13 Sep 2022
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
Mingyuan Zhang
Zhongang Cai
Liang Pan
Fangzhou Hong
Xinying Guo
Lei Yang
Ziwei Liu
DiffM
VGen
58
541
0
31 Aug 2022
1
2
3
Next