ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07750
  4. Cited By
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

22 May 2017
João Carreira
Andrew Zisserman
ArXivPDFHTML

Papers citing "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset"

50 / 1,478 papers shown
Title
End-to-End Referring Video Object Segmentation with Multimodal
  Transformers
End-to-End Referring Video Object Segmentation with Multimodal Transformers
Adam Botach
Evgenii Zheltonozhskii
Chaim Baskin
VOS
38
141
0
29 Nov 2021
Automated Detection of Patients in Hospital Video Recordings
Automated Detection of Patients in Hospital Video Recordings
Siddharth Sharma
Florian Dubost
Christopher Lee-Messer
D. Rubin
21
2
0
28 Nov 2021
Learning from Temporal Gradient for Semi-supervised Action Recognition
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
16
51
0
25 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
35
73
0
25 Nov 2021
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
24
0
25 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Luu Anh Tuan
Lijuan Wang
Zicheng Liu
VLM
55
218
0
24 Nov 2021
Background-Click Supervision for Temporal Action Localization
Background-Click Supervision for Temporal Action Localization
Le Yang
Junwei Han
Tao Zhao
Tianwei Lin
Dingwen Zhang
Jianxin Chen
36
61
0
24 Nov 2021
Multi-label Iterated Learning for Image Classification with Label
  Ambiguity
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Sai Rajeswar
Pau Rodríguez López
Soumye Singhal
David Vazquez
Rameswar Panda
VLM
28
30
0
23 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal
  Difference Transformer
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip Torr
Guoying Zhao
ViT
MedIm
143
167
0
23 Nov 2021
Sparse Fusion for Multimodal Transformers
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
27
7
0
23 Nov 2021
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed
  Video Analysis
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis
Zhaobo Qi
Shuhui Wang
Chi Su
Li Su
Weigang Zhang
Qingming Huang
27
10
0
23 Nov 2021
Self-Regulated Learning for Egocentric Video Activity Anticipation
Self-Regulated Learning for Egocentric Video Activity Anticipation
Zhaobo Qi
Shuhui Wang
Chi Su
Li Su
Qingming Huang
Q. Tian
EgoV
47
52
0
23 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
21
63
0
23 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale
  Video Transcriptions
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TS
VLM
31
189
0
19 Nov 2021
M2A: Motion Aware Attention for Accurate Video Action Recognition
M2A: Motion Aware Attention for Accurate Video Action Recognition
Brennan Gebotys
Alexander Wong
David A Clausi
27
3
0
18 Nov 2021
Evaluating Transformers for Lightweight Action Recognition
Evaluating Transformers for Lightweight Action Recognition
Raivo Koot
Markus Hennerbichler
Haiping Lu
ViT
30
8
0
18 Nov 2021
Learning to Align Sequential Actions in the Wild
Learning to Align Sequential Actions in the Wild
Weizhe Liu
Bugra Tekin
Huseyin Coskun
Vibhav Vineet
Pascal Fua
Marc Pollefeys
30
24
0
17 Nov 2021
Language bias in Visual Question Answering: A Survey and Taxonomy
Language bias in Visual Question Answering: A Survey and Taxonomy
Desen Yuan
30
12
0
16 Nov 2021
Towards Domain-Independent and Real-Time Gesture Recognition Using
  mmWave Signal
Towards Domain-Independent and Real-Time Gesture Recognition Using mmWave Signal
Yadong Li
Dongheng Zhang
Jinbo Chen
Jinwei Wan
Dong Zhang
Yang Hu
Qibin Sun
Yan Chen
30
69
0
11 Nov 2021
Sparse Adversarial Video Attacks with Spatial Transformations
Sparse Adversarial Video Attacks with Spatial Transformations
Ronghui Mu
Wenjie Ruan
Leandro Soriano Marcolino
Q. Ni
AAML
35
18
0
10 Nov 2021
Cross Attentional Audio-Visual Fusion for Dimensional Emotion
  Recognition
Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition
R Gnana Praveen
Eric Granger
P. Cardinal
CVBM
31
40
0
09 Nov 2021
Towards Debiasing Temporal Sentence Grounding in Video
Towards Debiasing Temporal Sentence Grounding in Video
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
50
16
0
08 Nov 2021
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Ruiyang Liu
Hai-Tao Zheng
Li Tao
Dun Liang
Haitao Zheng
91
97
0
07 Nov 2021
Will You Ever Become Popular? Learning to Predict Virality of Dance
  Clips
Will You Ever Become Popular? Learning to Predict Virality of Dance Clips
Jiahao Wang
Yunhong Wang
Nina Weng
Tianrui Chai
Annan Li
Faxi Zhang
Sansi Yu
32
13
0
06 Nov 2021
Sequence-to-Sequence Modeling for Action Identification at High Temporal
  Resolution
Sequence-to-Sequence Modeling for Action Identification at High Temporal Resolution
Aakash Kaku
Kangning Liu
A. Parnandi
H. Rajamohan
Kannan Venkataramanan
Anita Venkatesan
Audre Wirtanen
Natasha Pandit
Heidi M. Schambra
C. Fernandez‐Granda
27
5
0
03 Nov 2021
Relational Self-Attention: What's Missing in Attention for Video
  Understanding
Relational Self-Attention: What's Missing in Attention for Video Understanding
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
ViT
27
28
0
02 Nov 2021
Masking Modalities for Cross-modal Video Retrieval
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
19
29
0
01 Nov 2021
AdaPool: Exponential Adaptive Pooling for Information-Retaining
  Downsampling
AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling
Alexandros Stergiou
R. Poppe
47
79
0
01 Nov 2021
Hierarchical Deep Residual Reasoning for Temporal Moment Localization
Hierarchical Deep Residual Reasoning for Temporal Moment Localization
Ziyang Ma
Xianjing Han
Xuemeng Song
Yiran Cui
Liqiang Nie
18
9
0
31 Oct 2021
Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
Dinghao Fan
Hengjie Lu
Shugong Xu
Shan Cao
32
15
0
29 Oct 2021
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video
  Retrieval
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval
Ning Han
Jingjing Chen
Chuhao Shi
Yawen Zeng
Guangyi Xiao
Hao Chen
24
10
0
29 Oct 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
21
25
0
27 Oct 2021
Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis
Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis
Bowen Wu
Zhenyu Xie
Xiaodan Liang
Yubei Xiao
Haoye Dong
Liang Lin
3DH
23
6
0
27 Oct 2021
CTRN: Class-Temporal Relational Network for Action Detection
CTRN: Class-Temporal Relational Network for Action Detection
Rui Dai
Srijan Das
Francois Bremond
ViT
24
22
0
26 Oct 2021
Self-Denoising Neural Networks for Few Shot Learning
Self-Denoising Neural Networks for Few Shot Learning
S. Schwarcz
Sai Saketh Rambhatla
Ramalingam Chellappa
36
1
0
26 Oct 2021
Using Motion History Images with 3D Convolutional Networks in Isolated
  Sign Language Recognition
Using Motion History Images with 3D Convolutional Networks in Isolated Sign Language Recognition
Hamed Valizadegan
D. Caldwell
SLR
32
48
0
24 Oct 2021
A Closer Look at Few-Shot Video Classification: A New Baseline and
  Benchmark
A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark
Zhenxi Zhu
Limin Wang
Sheng Guo
Gangshan Wu
50
32
0
24 Oct 2021
LARNet: Latent Action Representation for Human Action Synthesis
LARNet: Latent Action Representation for Human Action Synthesis
Naman Biyani
A. J. Rana
Shruti Vyas
Yogesh S Rawat
15
4
0
21 Oct 2021
TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial
  Decoding
TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding
Zhengwei Wang
Qi She
A. Smolic
26
9
0
17 Oct 2021
ASFormer: Transformer for Action Segmentation
ASFormer: Transformer for Action Segmentation
Fangqiu Yi
Hongyu Wen
Tingting Jiang
ViT
79
174
0
16 Oct 2021
Shaping embodied agent behavior with activity-context priors from
  egocentric video
Shaping embodied agent behavior with activity-context priors from egocentric video
Tushar Nagarajan
Kristen Grauman
EgoV
LM&Ro
63
13
0
14 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
284
1,026
0
13 Oct 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
34
82
0
13 Oct 2021
TAda! Temporally-Adaptive Convolutions for Video Understanding
TAda! Temporally-Adaptive Convolutions for Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Mingqian Tang
Ziwei Liu
M. Ang
53
49
0
12 Oct 2021
Multi-Modal Interaction Graph Convolutional Network for Temporal
  Language Localization in Videos
Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos
Zongmeng Zhang
Xianjing Han
Xuemeng Song
Yan Yan
Liqiang Nie
41
36
0
12 Oct 2021
Rethinking Supervised Pre-training for Better Downstream Transferring
Rethinking Supervised Pre-training for Better Downstream Transferring
Yutong Feng
Jianwen Jiang
Mingqian Tang
Rong Jin
Yue Gao
SSL
58
39
0
12 Oct 2021
Video Is Graph: Structured Graph Module for Video Action Recognition
Video Is Graph: Structured Graph Module for Video Action Recognition
Rongjie Li
Xiaojun Wu
Tianyang Xu
46
12
0
12 Oct 2021
Joint Learning On The Hierarchy Representation for Fine-Grained Human
  Action Recognition
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition
M. C. Leong
Hui Li Tan
Haosong Zhang
Liyuan Li
Feng Lin
J. Lim
45
10
0
12 Oct 2021
Towards Streaming Egocentric Action Anticipation
Towards Streaming Egocentric Action Anticipation
Antonino Furnari
G. Farinella
EgoV
35
6
0
11 Oct 2021
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign
  Language Recognition
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
Hezhen Hu
Weichao Zhao
Wen-gang Zhou
Yuechen Wang
Houqiang Li
ViT
35
63
0
11 Oct 2021
Previous
123...151617...282930
Next