Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.10864
Cited By
A Short Note on the Kinetics-700-2020 Human Action Dataset
21 October 2020
Lucas Smaira
João Carreira
Eric Noland
Ellen Clancy
Amy Wu
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Short Note on the Kinetics-700-2020 Human Action Dataset"
50 / 71 papers shown
Title
Can Visuo-motor Policies Benefit from Random Exploration Data? A Case Study on Stacking
Shutong Jin
Axel Kaliff
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
VGen
39
0
0
30 Mar 2025
Human Activity Recognition in an Open World
D. Prijatelj
Samuel Grieggs
Jin Huang
Dawei Du
Ameya Shringi
Christopher Funk
Adam Kaufman
Eric Robertson
Walter J. Scheirer University of Notre Dame
72
3
0
17 Jan 2025
Human Action Anticipation: A Survey
Bolin Lai
Sam Toyer
Tushar Nagarajan
Rohit Girdhar
S. Zha
James M. Rehg
Kris Kitani
Kristen Grauman
Ruta Desai
Miao Liu
AI4TS
41
1
0
17 Oct 2024
HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data
Emin Orhan
VLM
SyDa
43
1
0
25 Jul 2024
OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
Andrew Zisserman
34
2
0
24 Jul 2024
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar
Xiaohan Wang
Yonatan Bitton
Idan Szpektor
Serena Yeung-Levy
VLM
LRM
58
8
0
08 Jul 2024
Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification
Jiaying Shi
Xuetong Xue
Shenghui Xu
VLM
45
0
0
08 Jul 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan
Jinfa Huang
Yongqi Xu
Yaoyang Liu
Shaofeng Zhang
Yujun Shi
Ruijie Zhu
Xinhua Cheng
Jiebo Luo
Li Yuan
EGVM
VGen
77
34
0
26 Jun 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
45
0
0
11 Jun 2024
EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
Masashi Hatano
Ryo Hachiuma
Hideo Saito
EgoV
37
3
0
30 May 2024
Contextual Emotion Recognition using Large Vision Language Models
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
VLM
73
3
0
14 May 2024
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
Haozhe Cheng
Chen Ju
Haicheng Wang
Jinxiang Liu
Mengting Chen
Qiang Hu
Xiaoyun Zhang
Yanfeng Wang
DiffM
VLM
43
5
0
23 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
54
3
0
06 Apr 2024
A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval
Andreea-Maria Oncescu
João F. Henriques
Andrew Zisserman
Samuel Albanie
A. Sophia Koepke
28
5
0
29 Feb 2024
Self-supervised learning of video representations from a child's perspective
A. Orhan
Wentao Wang
Alex N. Wang
Mengye Ren
Brenden M. Lake
32
4
0
01 Feb 2024
Deep Learning for Computer Vision based Activity Recognition and Fall Detection of the Elderly: a Systematic Review
F. X. Gaya-Morey
Cristina Manresa-Yee
Jose Maria Buades Rubio
31
12
0
22 Jan 2024
Learning from One Continuous Video Stream
João Carreira
Michael King
Viorica Patraucean
Dilara Gokay
Catalin Ionescu
...
Joseph Heyward
Carl Doersch
Y. Aytar
Dima Damen
Andrew Zisserman
CLL
32
4
0
01 Dec 2023
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos
Te-Lin Wu
Zi-Yi Dou
Qingyuan Hu
Yu Hou
Nischal Reddy Chandra
Marjorie Freedman
R. Weischedel
Nanyun Peng
41
5
0
02 Nov 2023
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
35
2
0
30 Oct 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIP
VLM
VGen
54
2
0
30 Oct 2023
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
Sudeep Dasari
Mohan Kumar Srirama
Unnat Jain
Abhinav Gupta
SSL
34
36
0
13 Oct 2023
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Avinash Madasu
Anahita Bhiwandiwalla
Vasudev Lal
VLM
37
0
0
07 Oct 2023
Beyond the Benchmark: Detecting Diverse Anomalies in Videos
Yoav Arad
Michael Werman
16
2
0
03 Oct 2023
Action Recognition Utilizing YGAR Dataset
Shuo Wang
Amiya Ranjan
Lawrence Jiang
12
0
0
02 Oct 2023
Natural Language Supervision for General-Purpose Audio Representations
Benjamin Elizalde
Soham Deshmukh
Huaming Wang
AuLLM
AI4TS
29
53
0
11 Sep 2023
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Jiaxi Gu
Shicong Wang
Haoyu Zhao
Tianyi Lu
Xing Zhang
Zuxuan Wu
Songcen Xu
Wei Zhang
Yu-Gang Jiang
Hang Xu
DiffM
VGen
39
44
0
07 Sep 2023
Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions
Michael Joannou
P. Rotshtein
U. Noppeney
21
0
0
18 Aug 2023
Dual-Stream Diffusion Net for Text-to-Video Generation
Binhui Liu
Xin Liu
Anbo Dai
Zhiyong Zeng
Dan Wang
Zhen Cui
Jian Yang
DiffM
VGen
20
9
0
16 Aug 2023
Learning high-level visual representations from a child's perspective without strong inductive biases
A. Orhan
Brenden M. Lake
SSL
24
18
0
24 May 2023
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
Xilun Chen
L. Yu
Wenhan Xiong
Barlas Ouguz
Yashar Mehdad
Wen-tau Yih
VGen
26
3
0
04 May 2023
Cross-view Action Recognition via Contrastive View-invariant Representation
Yuexi Zhang
Dan Luo
Balaji Sundareshan
Mario Sznaier
Octavia Camps
34
0
0
02 May 2023
Learning Human-Human Interactions in Images from Weak Textual Supervision
Morris Alper
Hadar Averbuch-Elor
VLM
47
2
0
27 Apr 2023
ATTACH Dataset: Annotated Two-Handed Assembly Actions for Human Action Understanding
Dustin Aganian
Benedict Stephan
M. Eisenbach
Corinna Stretz
H. Groß
24
11
0
17 Apr 2023
Learning video embedding space with Natural Language Supervision
P. Uppala
Abhishek Bamotra
S. Priya
Vaidehi Joshi
CLIP
23
1
0
25 Mar 2023
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]
Maureen Daum
Enhao Zhang
Dong He
Stephen Mussmann
Brandon Haynes
Ranjay Krishna
Magdalena Balazinska
32
4
0
07 Mar 2023
Evidence-empowered Transfer Learning for Alzheimer's Disease
Kai Tzu-iunn Ong
Hana Kim
Minjin Kim
Jinseong Jang
B. Sohn
Y. Choi
D. Hwang
Seong Jae Hwang
Jinyoung Yeo
MedIm
17
5
0
02 Mar 2023
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022
Pierre-Etienne Martin
J. Calandre
Boris Mansencal
J. Benois-Pineau
Renaud Péteri
L. Mascarilla
J. Morlier
32
4
0
31 Jan 2023
Transformers in Action Recognition: A Review on Temporal Modeling
Elham Shabaninia
Hossein Nezamabadi-pour
Fatemeh Shafizadegan
ViT
29
8
0
29 Dec 2022
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Yuecong Xu
Haozhi Cao
Zhenghua Chen
Xiaoli Li
Lihua Xie
Jianfei Yang
24
14
0
17 Nov 2022
Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task
Jannik Kossen
Cătălina Cangea
Eszter Vértes
Andrew Jaegle
Viorica Patraucean
Ira Ktena
Nenad Tomašev
Danielle Belgrave
35
8
0
09 Nov 2022
Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
Junru Wu
Yi Liang
Feng Han
Hassan Akbari
Zhangyang Wang
Cong Yu
39
9
0
03 Nov 2022
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
37
51
0
13 Oct 2022
Leveraging Self-Supervised Training for Unintentional Action Recognition
Enea Duka
Anna Kukleva
Bernt Schiele
38
1
0
23 Sep 2022
MAiVAR: Multimodal Audio-Image and Video Action Recognizer
Muhammad Bilal Shaikh
Douglas Chai
S. Islam
Naveed Akhtar
32
5
0
11 Sep 2022
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living
Zdravko Marinov
David Schneider
Alina Roitberg
Rainer Stiefelhagen
VGen
32
2
0
03 Aug 2022
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
41
158
0
03 Jun 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
38
29
0
13 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
85
1,262
0
04 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
46
3,360
0
29 Apr 2022
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
Alexandros Stergiou
Dima Damen
AI4TS
EgoV
EDL
17
7
0
28 Apr 2022
1
2
Next