Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.04261
Cited By
The "something something" video database for learning and evaluating visual common sense
13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The "something something" video database for learning and evaluating visual common sense"
50 / 308 papers shown
Title
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Xingrui Wang
Wufei Ma
Angtian Wang
Shuo Chen
Adam Kortylewski
Alan L. Yuille
34
3
0
02 Jun 2024
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Minttu Alakuijala
Reginald McLean
Isaac Woungang
Nariman Farsad
Samuel Kaski
Pekka Marttinen
Kai Yuan
LM&Ro
42
1
0
30 May 2024
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Jialong Wu
Shaofeng Yin
Ningya Feng
Xu He
Dong Li
Jianye Hao
Mingsheng Long
VGen
49
22
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
42
0
23 May 2024
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Rong Gao
Xin Liu
Bohao Xing
Zitong Yu
Björn W. Schuller
Heikki Kälviäinen
57
3
0
21 May 2024
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
Hongyu Qu
Rui Yan
Xiangbo Shu
Haoliang Gao
Peng Huang
Guo-Sen Xie
61
4
0
03 May 2024
VIEW: Visual Imitation Learning with Waypoints
Ananth Jonnavittula
Sagar Parekh
Dylan P. Losey
SSL
91
9
0
27 Apr 2024
Rank2Reward: Learning Shaped Reward Functions from Passive Video
Daniel Yang
Davin Tjia
Jacob Berg
Dima Damen
Pulkit Agrawal
Abhishek Gupta
OffRL
37
5
0
23 Apr 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
43
29
0
20 Feb 2024
VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation
Jialu Li
Aishwarya Padmakumar
Gaurav Sukhatme
Mohit Bansal
29
6
0
05 Feb 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
75
75
0
03 Feb 2024
Learning to Visually Connect Actions and their Effects
Eric Peh
Paritosh Parmar
Basura Fernando
24
2
0
19 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
67
1
0
15 Jan 2024
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Yash Jain
Anshul Nasery
Vibhav Vineet
Harkirat Singh Behl
VGen
36
30
0
12 Dec 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
61
399
0
28 Nov 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
26
1
0
28 Nov 2023
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
Wenhao Wu
Huanjin Yao
Mengxi Zhang
Yuxin Song
Wanli Ouyang
Jingdong Wang
VLM
28
29
0
27 Nov 2023
Learning Human Action Recognition Representations Without Real Humans
Howard Zhong
Samarth Mishra
Donghyun Kim
SouYoung Jin
Rameswar Panda
Hildegard Kuehne
Leonid Karlinsky
Venkatesh Saligrama
Aude Oliva
Rogerio Feris
24
3
0
10 Nov 2023
Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video Recognition
T. Fujii
Hayato Nakagawa
T. Takeshima
Y. Yumura
T. Hamagami
28
3
0
10 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
29
64
0
07 Nov 2023
What Makes Pre-Trained Visual Representations Successful for Robust Manipulation?
Kaylee Burns
Zach Witzel
Jubayer Ibn Hamid
Tianhe Yu
Chelsea Finn
Karol Hausman
OOD
SSL
29
22
0
03 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
28
63
0
30 Oct 2023
S3Aug: Segmentation, Sampling, and Shift for Action Recognition
Taiki Sugiura
Toru Tamaki
AI4TS
27
2
0
23 Oct 2023
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
30
7
0
16 Oct 2023
DyST: Towards Dynamic Neural Scene Representations on Real-World Videos
Maximilian Seitzer
Sjoerd van Steenkiste
Thomas Kipf
Klaus Greff
Mehdi S. M. Sajjadi
VGen
ViT
34
8
0
09 Oct 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Haodong Duan
Mingze Xu
Bing Shuai
Davide Modolo
Zhuowen Tu
Joseph Tighe
Alessandro Bergamo
ViT
35
1
0
20 Sep 2023
FrameRS: A Video Frame Compression Model Composed by Self supervised Video Frame Reconstructor and Key Frame Selector
Qiqian Fu
Guanhong Wang
Gaoang Wang
22
0
0
16 Sep 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Palaash Agrawal
Haidi Azaman
Cheston Tan
51
3
0
13 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
25
9
0
05 Sep 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
39
30
0
21 Aug 2023
M
3
^3
3
Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition
Hao Tang
Jun Liu
Shuanglin Yan
Rui Yan
Zechao Li
Jinhui Tang
21
37
0
06 Aug 2023
A Survey on Deep Learning-based Spatio-temporal Action Detection
Peng Wang
Fanwei Zeng
Yu Qian
34
5
0
03 Aug 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLM
ELM
32
502
0
30 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
40
8
0
18 Jul 2023
Multimodal Distillation for Egocentric Action Recognition
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
23
23
0
14 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
54
19
0
13 Jul 2023
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Yuqin Zhu
Yichen Zhu
ViT
72
17
0
05 Jul 2023
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
38
14
0
20 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
38
31
0
08 Jun 2023
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning
Jialong Wu
Haoyu Ma
Chao Deng
Mingsheng Long
OffRL
31
24
0
29 May 2023
Visual Affordance Prediction for Guiding Robot Exploration
Homanga Bharadhwaj
Abhi Gupta
Shubham Tulsiani
44
12
0
28 May 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
56
4
0
25 May 2023
Is end-to-end learning enough for fitness activity recognition?
Antoine Mercier
Guillaume Berger
Sunny Panchal
Florian Letsch
Cornelius Boehm
Nahua Kang
Ingo Bax
Roland Memisevic
23
2
0
14 May 2023
Improve Video Representation with Temporal Adversarial Augmentation
Jinhao Duan
Quanfu Fan
Hao-Ran Cheng
Xiaoshuang Shi
Kaidi Xu
AAML
AI4TS
ViT
31
2
0
28 Apr 2023
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision
Jiani Huang
Ziyang Li
Mayur Naik
Ser-Nam Lim
37
3
0
15 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
116
3,041
0
14 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
36
74
0
06 Apr 2023
MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Changxin Gao
Yingya Zhang
Deli Zhao
Nong Sang
24
40
0
03 Apr 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
26
0
0
01 Apr 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
57
155
0
28 Mar 2023
Previous
1
2
3
4
5
6
7
Next