ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.04261
  4. Cited By
The "something something" video database for learning and evaluating
  visual common sense

The "something something" video database for learning and evaluating visual common sense

13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
    VLM
ArXivPDFHTML

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 314 papers shown
Title
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for
  Cross-Domain Robustness in Action Recognition
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
Sofia Broomé
Ernest Pokropek
Boyu Li
Hedvig Kjellström
21
7
0
22 Dec 2021
Precondition and Effect Reasoning for Action Recognition
Precondition and Effect Reasoning for Action Recognition
Hongsang Yoo
Haopeng Li
Qiuhong Ke
Liangchen Liu
Rui Zhang
CML
41
4
0
19 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
88
655
0
16 Dec 2021
Co-training Transformer with Videos and Images Improves Action
  Recognition
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
28
54
0
14 Dec 2021
SVIP: Sequence VerIfication for Procedures in Videos
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
29
17
0
13 Dec 2021
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Anirudh Thatipelli
Sanath Narayan
Salman Khan
Rao Muhammad Anwer
Fahad Shahbaz Khan
Guohao Li
ViT
27
88
0
09 Dec 2021
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural
  Architecture Search
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search
Yi Ding
Xinyu Gong
Junru Wu
Humphrey Shi
Zhicheng Yan
Zhangyang Wang
VGen
52
1
0
09 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video
  Recognition
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Keli Zhang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
32
21
0
09 Dec 2021
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for
  Few-shot Video Classification
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
Rex Liu
Huan Zhang
Hamed Pirsiavash
Xin Liu
ViT
22
11
0
08 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
59
6
0
08 Dec 2021
TCGL: Temporal Contrastive Graph for Self-supervised Video
  Representation Learning
TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning
Yang Liu
Keze Wang
Lingbo Liu
Hao Lan
Liang Lin
SSL
AI4TS
53
113
0
07 Dec 2021
BEVT: BERT Pretraining of Video Transformers
BEVT: BERT Pretraining of Video Transformers
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu-Gang Jiang
Luowei Zhou
Lu Yuan
ViT
39
203
0
02 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
75
677
0
02 Dec 2021
Self-supervised Video Transformer
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Michael S. Ryoo
ViT
39
84
0
02 Dec 2021
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
48
238
0
24 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
21
63
0
23 Nov 2021
M2A: Motion Aware Attention for Accurate Video Action Recognition
M2A: Motion Aware Attention for Accurate Video Action Recognition
Brennan Gebotys
Alexander Wong
David A Clausi
27
3
0
18 Nov 2021
Evaluating Transformers for Lightweight Action Recognition
Evaluating Transformers for Lightweight Action Recognition
Raivo Koot
Markus Hennerbichler
Haiping Lu
ViT
28
8
0
18 Nov 2021
Relational Self-Attention: What's Missing in Attention for Video
  Understanding
Relational Self-Attention: What's Missing in Attention for Video Understanding
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
ViT
27
28
0
02 Nov 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
21
24
0
27 Oct 2021
A Closer Look at Few-Shot Video Classification: A New Baseline and
  Benchmark
A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark
Zhenxi Zhu
Limin Wang
Sheng Guo
Gangshan Wu
43
32
0
24 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
250
1,024
0
13 Oct 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
30
82
0
13 Oct 2021
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy
  Labels
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels
Mohit Sharma
Rajkumar Patra
Harshali Desai
Shruti Vyas
Yogesh S Rawat
R. Shah
VGen
NoLa
16
3
0
13 Oct 2021
TAda! Temporally-Adaptive Convolutions for Video Understanding
TAda! Temporally-Adaptive Convolutions for Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Mingqian Tang
Ziwei Liu
M. Ang
43
49
0
12 Oct 2021
Joint Learning On The Hierarchy Representation for Fine-Grained Human
  Action Recognition
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition
M. C. Leong
Hui Li Tan
Haosong Zhang
Liyuan Li
Feng Lin
J. Lim
37
10
0
12 Oct 2021
Motion-aware Contrastive Video Representation Learning via
  Foreground-background Merging
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Shuangrui Ding
Maomao Li
Tianyu Yang
Rui Qian
Haohang Xu
Qingyi Chen
Jue Wang
Hongkai Xiong
SSL
28
49
0
30 Sep 2021
Searching for Two-Stream Models in Multivariate Space for Video
  Recognition
Searching for Two-Stream Models in Multivariate Space for Video Recognition
Xinyu Gong
Heng Wang
Zheng Shou
Matt Feiszli
Zhangyang Wang
Zhicheng Yan
30
9
0
30 Aug 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action
  Recognition
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
26
77
0
20 Aug 2021
Temporal Kernel Consistency for Blind Video Super-Resolution
Temporal Kernel Consistency for Blind Video Super-Resolution
Li Xiang
Royson Lee
Mohamed S. Abdelfattah
Nicholas D. Lane
Hongkai Wen
SupR
29
0
0
18 Aug 2021
Temporal Action Segmentation with High-level Complex Activity Labels
Temporal Action Segmentation with High-level Complex Activity Labels
Guodong Ding
Angela Yao
33
18
0
15 Aug 2021
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual,
  Interactive, and Ecological Environments
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments
S. Srivastava
Chengshu Li
Michael Lingelbach
Roberto Martín-Martín
Fei Xia
...
Chenxi Liu
Silvio Savarese
H. Gweon
Jiajun Wu
Li Fei-Fei
LM&Ro
151
157
0
06 Aug 2021
Adaptive Recursive Circle Framework for Fine-grained Action Recognition
Adaptive Recursive Circle Framework for Fine-grained Action Recognition
Hanxi Lin
Xinxiao Wu
Jiebo Luo
25
1
0
25 Jul 2021
EAN: Event Adaptive Network for Enhanced Action Recognition
EAN: Event Adaptive Network for Enhanced Action Recognition
Yuan Tian
Yichao Yan
Guangtao Zhai
G. Guo
Zhiyong Gao
35
41
0
22 Jul 2021
TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition
TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition
Shuyuan Li
Huabin Liu
Rui Qian
Yuxi Li
John See
Mengjuan Fei
Xiaoyuan Yu
W. Lin
20
75
0
10 Jul 2021
Can An Image Classifier Suffice For Action Recognition?
Can An Image Classifier Suffice For Action Recognition?
Quanfu Fan
Chun-Fu Chen
Chen
Yikang Shen
ViT
29
33
0
26 Jun 2021
Space-time Mixing Attention for Video Transformer
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
30
124
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
18
274
0
09 Jun 2021
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in
  Time
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time
Shao-Wei Liu
Hanwen Jiang
Jiarui Xu
Sifei Liu
Xiaolong Wang
3DH
35
161
0
09 Jun 2021
Towards Training Stronger Video Vision Transformers for
  EPIC-KITCHENS-100 Action Recognition
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition
Ziyuan Huang
Zhiwu Qing
Xiang Wang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Zhurong Xia
Mingqian Tang
Nong Sang
M. Ang
ViT
24
11
0
09 Jun 2021
CT-Net: Channel Tensorization Network for Video Classification
CT-Net: Channel Tensorization Network for Video Classification
Kunchang Li
Xianhang Li
Yali Wang
Jun Wang
Yu Qiao
ViT
22
55
0
03 Jun 2021
DSANet: Dynamic Segment Aggregation Network for Video-Level
  Representation Learning
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wenhao Wu
Yuxiang Zhao
Yanwu Xu
Xiao Tan
Dongliang He
...
Jinxing Ye
Yingying Li
Mingde Yao
Zichao Dong
Yifeng Shi
AI4TS
27
27
0
25 May 2021
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized
  Sports Actions
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Yixuan Li
Lei Chen
Runyu He
Zhenzhi Wang
Gangshan Wu
Limin Wang
27
97
0
16 May 2021
Adaptive Focus for Efficient Video Recognition
Adaptive Focus for Efficient Video Recognition
Yulin Wang
Zhaoxi Chen
Haojun Jiang
Shiji Song
Yizeng Han
Gao Huang
42
98
0
07 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
39
257
0
29 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
Yuan Zhi
Zhan Tong
Limin Wang
Gangshan Wu
TTA
19
72
0
20 Apr 2021
Higher Order Recurrent Space-Time Transformer for Video Action
  Prediction
Higher Order Recurrent Space-Time Transformer for Video Action Prediction
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Oswald Lanz
30
9
0
17 Apr 2021
Ego-Exo: Transferring Visual Representations from Third-person to
  First-person Videos
Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos
Yanghao Li
Tushar Nagarajan
Bo Xiong
Kristen Grauman
EgoV
51
84
0
16 Apr 2021
Few-Shot Action Recognition with Compromised Metric via Optimal
  Transport
Few-Shot Action Recognition with Compromised Metric via Optimal Transport
Su Lu
Han-Jia Ye
De-Chuan Zhan
26
18
0
08 Apr 2021
Previous
1234567
Next