Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.06704
Cited By
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
14 April 2020
Dian Shao
Yue Zhao
Bo Dai
Dahua Lin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding"
45 / 45 papers shown
Title
Hierarchical and Multimodal Data for Daily Activity Understanding
Ghazal Kaviani
Yavuz Yarici
Seulgi Kim
Mohit Prabhushankar
Ghassan AlRegib
Mashhour Solh
Ameya Patil
75
0
0
24 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kai Zhang
MGen
VGen
130
1
0
01 Apr 2025
Video Action Differencing
James Burgess
Xiaohan Wang
Yuhui Zhang
Anita Rau
Alejandro Lozano
Lisa Dunlap
Trevor Darrell
Serena Yeung-Levy
VGen
70
0
0
10 Mar 2025
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation
Haibo Tong
Zhaoyang Wang
Zhe Chen
Haonian Ji
Shi Qiu
...
Peng Xia
Mingyu Ding
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
EGVM
VGen
136
3
0
03 Feb 2025
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Mian
Joey Tianyi Zhou
Chen Chen
LRM
83
2
0
15 Nov 2024
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
H. Xia
Zhengbang Yang
Junbo Zou
Rhys Tracy
Yuqing Wang
...
Xun Shao
Zhuoqing Xie
Yuan-fang Wang
Weining Shen
Hanjie Chen
ReLM
LRM
ELM
60
3
0
11 Oct 2024
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li
Pengyu Liu
Pengyu Liu
Guoliang Chen
Zhiliang Wu
Hehe Fan
Meng Wang
78
7
0
07 Jul 2024
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Haopeng Li
Andong Deng
Qiuhong Ke
Jun Liu
Hossein Rahmani
Yulan Guo
Mohammed Bennamoun
Chen Chen
76
17
0
03 Jan 2024
ConvGRU in Fine-grained Pitching Action Recognition for Action Outcome Prediction
Tianqi Ma
Lin Zhang
Xiumin Diao
Ou Ma
28
3
0
18 Aug 2020
Intra- and Inter-Action Understanding via Temporal Action Parsing
Dian Shao
Yue Zhao
Bo Dai
Dahua Lin
31
71
0
20 May 2020
Temporal Pyramid Network for Action Recognition
Ceyuan Yang
Yinghao Xu
Jianping Shi
Bo Dai
Bolei Zhou
34
369
0
07 Apr 2020
Relational Action Forecasting
Chen Sun
Abhinav Shrivastava
Carl Vondrick
Rahul Sukthankar
Kevin Patrick Murphy
Cordelia Schmid
45
80
0
08 Apr 2019
Video Action Transformer Network
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
ViT
107
706
0
06 Dec 2018
CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark
Jiefeng Li
Can Wang
Hao Zhu
Yihuan Mao
Haoshu Fang
Cewu Lu
33
505
0
02 Dec 2018
TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin
Chuang Gan
Song Han
71
1,677
0
20 Nov 2018
Flow-Grounded Spatial-Temporal Video Prediction from Still Images
Yijun Li
Chen Fang
Jimei Yang
Zhaowen Wang
Xin Lu
Ming-Hsuan Yang
3DH
39
138
0
25 Jul 2018
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
65
1,011
0
08 Apr 2018
Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
Li Ding
Chenliang Xu
68
180
0
28 Mar 2018
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Sijie Yan
Yuanjun Xiong
Dahua Lin
GNN
200
4,124
0
23 Jan 2018
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
67
543
0
09 Jan 2018
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
179
3,007
0
30 Nov 2017
Temporal Relational Reasoning in Videos
Bolei Zhou
A. Andonian
Aude Oliva
Antonio Torralba
NAI
71
1,035
0
22 Nov 2017
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification
Ali Diba
Mohsen Fayyaz
Vivek Sharma
A. Karami
M. M. Arzani
Rahman Yousefzadeh
Luc Van Gool
47
241
0
22 Nov 2017
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
192
8,867
0
21 Nov 2017
The "something something" video database for learning and evaluating visual common sense
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
66
1,507
0
13 Jun 2017
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
C. Pantofaru
...
G. Toderici
Susanna Ricco
Rahul Sukthankar
Cordelia Schmid
Jitendra Malik
VGen
82
1,021
0
23 May 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
192
7,961
0
22 May 2017
The Kinetics Human Action Video Dataset
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
...
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
193
3,771
0
19 May 2017
Temporal Segment Networks for Action Recognition in Videos
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
81
807
0
08 May 2017
Temporal Action Detection with Structured Segment Networks
Yue Zhao
Yuanjun Xiong
Limin Wang
Zhirong Wu
Xiaoou Tang
Dahua Lin
38
911
0
20 Apr 2017
ActionVLAD: Learning spatio-temporal aggregation for action classification
Rohit Girdhar
Deva Ramanan
Abhinav Gupta
Josef Sivic
Bryan C. Russell
AI4TS
56
451
0
10 Apr 2017
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
Rui Hou
Chong Chen
M. Shah
MedIm
51
333
0
30 Mar 2017
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
Huijuan Xu
Abir Das
Kate Saenko
3DPC
109
714
0
22 Mar 2017
RMPE: Regional Multi-person Pose Estimation
Haoshu Fang
Shuqin Xie
Yu-Wing Tai
Cewu Lu
3DH
99
1,583
0
01 Dec 2016
Temporal Convolutional Networks for Action Segmentation and Detection
Colin S. Lea
Michael D. Flynn
René Vidal
A. Reiter
Gregory Hager
81
1,478
0
16 Nov 2016
Human Action Localization with Sparse Spatial Supervision
Philippe Weinzaepfel
Xavier Martin
Cordelia Schmid
36
47
0
17 May 2016
Spot On: Action Localization from Pointly-Supervised Proposals
Pascal Mettes
Jan van Gemert
Cees G. M. Snoek
53
126
0
26 Apr 2016
Convolutional Two-Stream Network Fusion for Video Action Recognition
Christoph Feichtenhofer
A. Pinz
Andrew Zisserman
117
2,606
0
22 Apr 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
67
1,232
0
06 Apr 2016
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
Serena Yeung
Olga Russakovsky
Ning Jin
Mykhaylo Andriluka
Greg Mori
Li Fei-Fei
VLM
62
438
0
21 Jul 2015
P-CNN: Pose-based CNN Features for Action Recognition
Guilhem Chéron
Ivan Laptev
Cordelia Schmid
47
610
0
11 Jun 2015
Recognizing Fine-Grained and Composite Activities using Hand-Centric Features and Script Data
Marcus Rohrbach
Anna Rohrbach
Michaela Regneri
S. Amin
Mykhaylo Andriluka
Manfred Pinkal
Bernt Schiele
66
178
0
23 Feb 2015
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Jeff Donahue
Lisa Anne Hendricks
Marcus Rohrbach
Subhashini Venugopalan
S. Guadarrama
Kate Saenko
Trevor Darrell
VLM
117
6,037
0
17 Nov 2014
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
212
7,518
0
09 Jun 2014
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIP
VGen
84
6,100
0
03 Dec 2012
1