ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.04261
  4. Cited By
The "something something" video database for learning and evaluating
  visual common sense

The "something something" video database for learning and evaluating visual common sense

13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
    VLM
ArXivPDFHTML

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 322 papers shown
Title
SVT: Supertoken Video Transformer for Efficient Video Understanding
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
26
0
0
01 Apr 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
57
155
0
28 Mar 2023
Seer: Language Instructed Video Prediction with Latent Diffusion Models
Seer: Language Instructed Video Prediction with Latent Diffusion Models
Xianfan Gu
Chuan Wen
Weirui Ye
Jiaming Song
Yang Gao
DiffM
VGen
21
40
0
27 Mar 2023
Learning video embedding space with Natural Language Supervision
Learning video embedding space with Natural Language Supervision
P. Uppala
Abhishek Bamotra
S. Priya
Vaidehi Joshi
CLIP
23
1
0
25 Mar 2023
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video
  Recognition
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition
Junyan Wang
Zhenhong Sun
Yichen Qian
Dong Gong
Xiuyu Sun
Ming Lin
M. Pagnucco
Yang Song
3DPC
20
11
0
05 Mar 2023
Language-Driven Representation Learning for Robotics
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Percy Liang
LM&Ro
SSL
47
145
0
24 Feb 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
Chong Chen
Mu Li
ViT
58
144
0
06 Feb 2023
The Construction of Reality in an AI: A Review
The Construction of Reality in an AI: A Review
J. W. Johnston
3DV
13
1
0
03 Feb 2023
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
Cheng Lu
Xiaojie Jin
Zhicheng Huang
Qibin Hou
Mingg-Ming Cheng
Jiashi Feng
37
8
0
15 Jan 2023
Similarity Contrastive Estimation for Image and Video Soft Contrastive
  Self-Supervised Learning
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
J. Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
SSL
19
6
0
21 Dec 2022
A Survey on Human Action Recognition
A Survey on Human Action Recognition
Zhou Shuchang
29
0
0
20 Dec 2022
Egocentric Video Task Translation
Egocentric Video Task Translation
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
29
13
0
13 Dec 2022
MAGVIT: Masked Generative Video Transformer
MAGVIT: Masked Generative Video Transformer
Lijun Yu
Yong Cheng
Kihyuk Sohn
José Lezama
Han Zhang
...
Alexander G. Hauptmann
Ming-Hsuan Yang
Yuan Hao
Irfan Essa
Lu Jiang
DiffM
VGen
38
228
0
10 Dec 2022
Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition
Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition
Xin Ni
Yong Liu
Hao Wen
Yatai Ji
Jing Xiao
Yujiu Yang
37
9
0
09 Dec 2022
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene
  Segmentation
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
27
2
0
09 Dec 2022
VideoDex: Learning Dexterity from Internet Videos
VideoDex: Learning Dexterity from Internet Videos
Kenneth Shaw
Shikhar Bahl
Deepak Pathak
30
89
0
08 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers
  using Synthetic Scene Data
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
41
16
0
08 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
Fine-tuned CLIP Models are Efficient Video Learners
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
CLIP
VLM
34
150
0
06 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
36
54
0
06 Dec 2022
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
57
309
0
06 Dec 2022
Self-supervised and Weakly Supervised Contrastive Learning for Frame-wise Action Representations
Minghao Chen
Renbo Tu
Chenxi Huang
Yuqi Lin
Boxi Wu
Deng Cai
SSL
AI4TS
32
1
0
06 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
39
0
0
03 Dec 2022
Lightweight Structure-Aware Attention for Visual Understanding
Lightweight Structure-Aware Attention for Visual Understanding
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
28
2
0
29 Nov 2022
Proactive Robot Assistance via Spatio-Temporal Object Modeling
Proactive Robot Assistance via Spatio-Temporal Object Modeling
Maithili Patel
Sonia Chernova
29
26
0
28 Nov 2022
Video Test-Time Adaptation for Action Recognition
Video Test-Time Adaptation for Action Recognition
Wei Lin
M. Jehanzeb Mirza
Mateusz Koziñski
Horst Possegger
Hilde Kuehne
Horst Bischof
TTA
47
31
0
24 Nov 2022
Dynamic Appearance: A Video Representation for Action Recognition with
  Joint Training
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
27
1
0
23 Nov 2022
Knowledge Prompting for Few-shot Action Recognition
Knowledge Prompting for Few-shot Action Recognition
Yuheng Shi
Xinxiao Wu
Hanxi Lin
VLM
19
4
0
22 Nov 2022
Look More but Care Less in Video Recognition
Look More but Care Less in Video Recognition
Yitian Zhang
Yue Bai
Haiquan Wang
Yi Xu
Yun Fu
27
9
0
18 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video
  UniFormer
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
30
107
0
17 Nov 2022
Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @
  Ego4d Looking at me Challenge
Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @ Ego4d Looking at me Challenge
Yinan He
Guo Chen
14
0
0
17 Nov 2022
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive
  Survey
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Yuecong Xu
Haozhi Cao
Zhenghua Chen
Xiaoli Li
Lihua Xie
Jianfei Yang
24
14
0
17 Nov 2022
Learning Reward Functions for Robotic Manipulation by Observing Humans
Learning Reward Functions for Robotic Manipulation by Observing Humans
Minttu Alakuijala
Gabriel Dulac-Arnold
Julien Mairal
Jean Ponce
Cordelia Schmid
OffRL
37
26
0
16 Nov 2022
Dynamic Temporal Filtering in Video Models
Dynamic Temporal Filtering in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Chong-Wah Ngo
Tao Mei
AI4TS
24
17
0
15 Nov 2022
Discovering A Variety of Objects in Spatio-Temporal Human-Object
  Interactions
Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions
Yong-Lu Li
Hongwei Fan
Zuoyu Qiu
Yiming Dou
Liang Xu
...
Peiyang Guo
Haisheng Su
Dongliang Wang
Wei Wu
Cewu Lu
35
7
0
14 Nov 2022
Metaphors We Learn By
Metaphors We Learn By
Roland Memisevic
29
0
0
11 Nov 2022
PatchBlender: A Motion Prior for Video Transformers
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
27
0
0
11 Nov 2022
Extending Temporal Data Augmentation for Video Action Recognition
Extending Temporal Data Augmentation for Video Action Recognition
Artjoms Gorpincenko
Michal Mackiewicz
ViT
29
4
0
09 Nov 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online
  Action Prediction
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J Liang
Tal Hassner
James J. Clark
27
3
0
24 Oct 2022
Solving Reasoning Tasks with a Slot Transformer
Solving Reasoning Tasks with a Slot Transformer
Ryan Faulkner
Daniel Zoran
LRM
26
1
0
20 Oct 2022
MovieCLIP: Visual Scene Recognition in Movies
MovieCLIP: Visual Scene Recognition in Movies
Digbalay Bose
Rajat Hebbar
Krishna Somandepalli
Haoyang Zhang
Huayu Chen
K. Cole-McLaughlin
Haoran Wang
Shrikanth Narayanan
CLIP
22
21
0
20 Oct 2022
Transfer-learning for video classification: Video Swin Transformer on multiple domains
Transfer-learning for video classification: Video Swin Transformer on multiple domains
Daniel de Oliveira
D. Matos
ViT
24
0
0
18 Oct 2022
S4ND: Modeling Images and Videos as Multidimensional Signals Using State
  Spaces
S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces
Eric N. D. Nguyen
Karan Goel
Albert Gu
Gordon W. Downs
Preey Shah
Tri Dao
S. Baccus
Christopher Ré
VLM
22
39
0
12 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal
  Contrastive Learning
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
20
68
0
12 Oct 2022
Neighbourhood Representative Sampling for Efficient End-to-end Video
  Quality Assessment
Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment
Haoning Wu
Chaofeng Chen
Liang Liao
Jingwen Hou
Wenxiu Sun
Qiong Yan
Liang Feng
Weisi Lin
51
44
0
11 Oct 2022
Self-supervised Video Representation Learning with Motion-Aware Masked
  Autoencoders
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
H. Yao
Yi-Xin Jiang
Xiatian Zhu
Zehuan Yuan
37
19
0
09 Oct 2022
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior
  Understanding in the Industrial-like Domain
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain
Francesco Ragusa
Antonino Furnari
G. Farinella
EgoV
43
24
0
19 Sep 2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language
  Representation Alignment
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Rui Song
Houqiang Li
Jiebo Luo
CLIP
VLM
25
68
0
14 Sep 2022
Vision Transformers for Action Recognition: A Survey
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Mian
ViT
19
44
0
13 Sep 2022
Efficient Attention-free Video Shift Transformers
Efficient Attention-free Video Shift Transformers
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
ViT
29
1
0
23 Aug 2022
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition
  Analysis for Adversarial Multi-task Video Understanding
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding
Stephen Su
Sam Kwong
Qingyu Zhao
De-An Huang
Juan Carlos Niebles
Ehsan Adeli
27
0
0
22 Aug 2022
Previous
1234567
Next