ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.06987
  4. Cited By
A Short Note on the Kinetics-700 Human Action Dataset

A Short Note on the Kinetics-700 Human Action Dataset

15 July 2019
João Carreira
Eric Noland
Chloe Hillier
Andrew Zisserman
ArXivPDFHTML

Papers citing "A Short Note on the Kinetics-700 Human Action Dataset"

50 / 118 papers shown
Title
Actor-identified Spatiotemporal Action Detection -- Detecting Who Is
  Doing What in Videos
Actor-identified Spatiotemporal Action Detection -- Detecting Who Is Doing What in Videos
Fan Yang
Norimichi Ukita
S. Sakti
Satoshi Nakamura
19
0
0
27 Aug 2022
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition
  Analysis for Adversarial Multi-task Video Understanding
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding
Stephen Su
Sam Kwong
Qingyu Zhao
De-An Huang
Juan Carlos Niebles
Ehsan Adeli
27
0
0
22 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
Adaptive occlusion sensitivity analysis for visually explaining video
  recognition networks
Adaptive occlusion sensitivity analysis for visually explaining video recognition networks
Tomoki Uchiyama
Naoya Sogi
S. Iizuka
Koichiro Niinuma
Kazuhiro Fukui
24
2
0
26 Jul 2022
CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
Haoning Zhu
Wayne Wu
Wentao Zhu
Liming Jiang
Siwei Tang
Li Zhang
Ziwei Liu
Chen Change Loy
60
155
0
25 Jul 2022
MVP: Robust Multi-View Practice for Driving Action Localization
MVP: Robust Multi-View Practice for Driving Action Localization
Jingjie Shang
Kunchang Li
Kaibin Tian
Haisheng Su
Yangguang Li
39
3
0
05 Jul 2022
Context-aware Proposal Network for Temporal Action Detection
Context-aware Proposal Network for Temporal Action Detection
Xiang Wang
Han Zhang
Shiwei Zhang
Changxin Gao
Yuanjie Shao
Nong Sang
17
2
0
18 Jun 2022
Analysis and Extensions of Adversarial Training for Video Classification
Analysis and Extensions of Adversarial Training for Video Classification
K. A. Kinfu
René Vidal
AAML
33
13
0
16 Jun 2022
Spatial-temporal Concept based Explanation of 3D ConvNets
Spatial-temporal Concept based Explanation of 3D ConvNets
Yi Ji
Yu Wang
K. Mori
Jien Kato
3DPC
FAtt
29
7
0
09 Jun 2022
Deepfake Caricatures: Amplifying attention to artifacts increases
  deepfake detection by humans and machines
Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines
Camilo Luciano Fosco
Emilie Josephs
A. Andonian
Allen Lee
Xi Wang
A. Oliva
44
4
0
01 Jun 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
45
29
0
13 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
85
1,263
0
04 May 2022
HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
Zhongang Cai
Daxuan Ren
Ailing Zeng
Zhengyu Lin
Tao Yu
...
Fangzhou Hong
Mingyuan Zhang
Chen Change Loy
Lei Yang
Ziwei Liu
3DH
39
101
0
28 Apr 2022
3D Convolutional Networks for Action Recognition: Application to Sport
  Gesture Recognition
3D Convolutional Networks for Action Recognition: Application to Sport Gesture Recognition
Pierre-Etienne Martin
J. Benois-Pineau
Renaud Péteri
A. Zemmari
J. Morlier
27
5
0
13 Apr 2022
Vision Models Are More Robust And Fair When Pretrained On Uncurated
  Images Without Supervision
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal
Quentin Duval
Isaac Seessel
Mathilde Caron
Ishan Misra
Levent Sagun
Armand Joulin
Piotr Bojanowski
VLM
SSL
26
110
0
16 Feb 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient
  Long-Term Video Recognition
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
48
198
0
20 Jan 2022
NapierOne: A modern mixed file data set alternative to Govdocs1
NapierOne: A modern mixed file data set alternative to Govdocs1
Simon R. Davies
Richard Macfarlane
William J. Buchanan
25
17
0
20 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
24
103
0
16 Jan 2022
Sign Language Video Retrieval with Free-Form Textual Queries
Sign Language Video Retrieval with Free-Form Textual Queries
A. Duarte
Samuel Albanie
Xavier Giró-i-Nieto
Gül Varol
SLR
53
29
0
07 Jan 2022
SLIP: Self-supervision meets Language-Image Pre-training
SLIP: Self-supervision meets Language-Image Pre-training
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
VLM
CLIP
63
481
0
23 Dec 2021
Tell me what you see: A zero-shot action recognition method based on
  natural language descriptions
Tell me what you see: A zero-shot action recognition method based on natural language descriptions
Valter Estevam
Rayson Laroca
David Menotti
Hélio Pedrini
38
13
0
18 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
100
655
0
16 Dec 2021
Co-training Transformer with Videos and Images Improves Action
  Recognition
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
28
54
0
14 Dec 2021
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for
  Few-shot Video Classification
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
Rex Liu
Huan Zhang
Hamed Pirsiavash
Xin Liu
ViT
25
11
0
08 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
75
679
0
02 Dec 2021
Learning from Temporal Gradient for Semi-supervised Action Recognition
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
12
51
0
25 Nov 2021
AdaPool: Exponential Adaptive Pooling for Information-Retaining
  Downsampling
AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling
Alexandros Stergiou
R. Poppe
39
79
0
01 Nov 2021
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned
  Meta-Adaptation
Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation
Jay Patravali
Gaurav Mittal
Ye Yu
Fuxin Li
Mei Chen
18
19
0
30 Sep 2021
METEOR:A Dense, Heterogeneous, and Unstructured Traffic Dataset With
  Rare Behaviors
METEOR:A Dense, Heterogeneous, and Unstructured Traffic Dataset With Rare Behaviors
Rohan Chandra
Xijun Wang
Mridul Mahajan
Rahul Kala
Rishitha Palugulla
Chandrababu Naidu
Alok Jain
Tianyi Zhou
37
15
0
16 Sep 2021
LIGAR: Lightweight General-purpose Action Recognition
LIGAR: Lightweight General-purpose Action Recognition
Evgeny Izutov
15
3
0
30 Aug 2021
Multi-Object Tracking with Hallucinated and Unlabeled Videos
Multi-Object Tracking with Hallucinated and Unlabeled Videos
Daniel McKee
Bing Shuai
Andrew G. Berneshawi
Manchen Wang
Davide Modolo
Svetlana Lazebnik
Joseph Tighe
VOT
19
7
0
19 Aug 2021
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual
  Representations
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations
Josh Beal
Hao Wu
Dong Huk Park
Andrew Zhai
Dmitry Kislyuk
ViT
21
29
0
12 Aug 2021
UNIK: A Unified Framework for Real-world Skeleton-based Action
  Recognition
UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition
Di Yang
Yaohui Wang
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
F. Brémond
27
47
0
19 Jul 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
54
166
0
21 Jun 2021
Weakly-Supervised Temporal Action Localization Through Local-Global
  Background Modeling
Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling
Xiang Wang
Zhiwu Qing
Ziyuan Huang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Mingqian Tang
Yuanjie Shao
Nong Sang
29
4
0
20 Jun 2021
Towards Training Stronger Video Vision Transformers for
  EPIC-KITCHENS-100 Action Recognition
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition
Ziyuan Huang
Zhiwu Qing
Xiang Wang
Yutong Feng
Shiwei Zhang
Jianwen Jiang
Zhurong Xia
Mingqian Tang
Nong Sang
M. Ang
ViT
27
11
0
09 Jun 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
39
257
0
29 Apr 2021
VidTr: Video Transformer Without Convolutions
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
148
193
0
23 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Bo Xiong
Haoqi Fan
Kristen Grauman
Christoph Feichtenhofer
SSL
22
49
0
01 Apr 2021
Time and Frequency Network for Human Action Detection in Videos
Time and Frequency Network for Human Action Detection in Videos
Changhai Li
Huawei Chen
Jingqing Lu
Yang Huang
Yingying Liu
3DH
AI4TS
13
2
0
08 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
183
27,846
0
26 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
46
647
0
11 Feb 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual
  Video Representation Learning
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
71
45
0
26 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,434
0
04 Jan 2021
Refining activation downsampling with SoftPool
Refining activation downsampling with SoftPool
Alexandros Stergiou
R. Poppe
Grigorios Kalliatakis
32
158
0
02 Jan 2021
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLM
AI4TS
38
185
0
11 Dec 2020
Spatial-Temporal Alignment Network for Action Recognition and Detection
Spatial-Temporal Alignment Network for Action Recognition and Detection
Junwei Liang
Liangliang Cao
Xuehan Xiong
Ting Yu
Alexander G. Hauptmann
3DPC
16
9
0
04 Dec 2020
Multi-Temporal Convolutions for Human Action Recognition in Videos
Multi-Temporal Convolutions for Human Action Recognition in Videos
Alexandros Stergiou
R. Poppe
29
1
0
08 Nov 2020
A Short Note on the Kinetics-700-2020 Human Action Dataset
A Short Note on the Kinetics-700-2020 Human Action Dataset
Lucas Smaira
João Carreira
Eric Noland
Ellen Clancy
Amy Wu
Andrew Zisserman
19
137
0
21 Oct 2020
Previous
123
Next