ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08451
  4. Cited By
Efficient Video Action Detection with Token Dropout and Context
  Refinement
v1v2v3 (latest)

Efficient Video Action Detection with Token Dropout and Context Refinement

17 April 2023
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
ArXiv (abs)PDFHTMLGithub (33★)

Papers citing "Efficient Video Action Detection with Token Dropout and Context Refinement"

37 / 37 papers shown
Title
Action tube generation by person query matching for spatio-temporal action detection
Action tube generation by person query matching for spatio-temporal action detection
Kazuki Omi
Jion Oshima
Toru Tamaki
130
0
0
17 Mar 2025
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
113
0
0
18 Dec 2024
STMixer: A One-Stage Sparse Action Detector
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
60
0
0
15 Apr 2024
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action
  Detection
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
48
3
0
28 Mar 2023
End-to-End Semi-Supervised Learning for Video Action Detection
End-to-End Semi-Supervised Learning for Video Action Detection
Akash Kumar
Yogesh S Rawat
53
32
0
08 Mar 2022
Learning to Merge Tokens in Vision Transformers
Learning to Merge Tokens in Vision Transformers
Cédric Renggli
André Susano Pinto
N. Houlsby
Basil Mustafa
J. Puigcerver
C. Riquelme
MoMe
64
58
0
24 Feb 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
149
668
0
16 Dec 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
57
65
0
23 Nov 2021
ViDT: An Efficient and Effective Fully Transformer-based Object Detector
ViDT: An Efficient and Effective Fully Transformer-based Object Detector
Hwanjun Song
Deqing Sun
Sanghyuk Chun
Varun Jampani
Dongyoon Han
Byeongho Heo
Wonjae Kim
Ming-Hsuan Yang
141
77
0
08 Oct 2021
Video Swin Transformer
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
106
1,482
0
24 Jun 2021
Space-time Mixing Attention for Video Transformer
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
91
126
0
10 Jun 2021
DynamicViT: Efficient Vision Transformers with Dynamic Token
  Sparsification
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
Yongming Rao
Wenliang Zhao
Benlin Liu
Jiwen Lu
Jie Zhou
Cho-Jui Hsieh
ViT
90
699
0
03 Jun 2021
Adaptive Focus for Efficient Video Recognition
Adaptive Focus for Efficient Video Recognition
Yulin Wang
Zhaoxi Chen
Haojun Jiang
Shiji Song
Yizeng Han
Gao Huang
74
99
0
07 May 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
132
1,259
0
22 Apr 2021
All Tokens Matter: Token Labeling for Training Better Vision
  Transformers
All Tokens Matter: Token Labeling for Training Better Vision Transformers
Zihang Jiang
Qibin Hou
Li-xin Yuan
Daquan Zhou
Yujun Shi
Xiaojie Jin
Anran Wang
Jiashi Feng
ViT
85
209
0
22 Apr 2021
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
Yuan Zhi
Zhan Tong
Limin Wang
Gangshan Wu
TTA
51
73
0
20 Apr 2021
Rethinking Spatial Dimensions of Vision Transformers
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
ViT
506
581
0
30 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
222
2,150
0
29 Mar 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
389
2,053
0
09 Feb 2021
Context-Aware RCNN: A Baseline for Action Detection in Videos
Context-Aware RCNN: A Baseline for Action Detection in Videos
Jianchao Wu
Zhanghui Kuang
Limin Wang
Wayne Zhang
Gangshan Wu
110
80
0
20 Jul 2020
Dynamic Sampling Networks for Efficient Action Recognition in Videos
Dynamic Sampling Networks for Efficient Action Recognition in Videos
Yin-Dong Zheng
Zhaoyang Liu
Tong Lu
Limin Wang
52
77
0
28 Jun 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT3DVPINN
421
13,048
0
26 May 2020
Asynchronous Interaction Aggregation for Action Detection
Asynchronous Interaction Aggregation for Action Detection
Jiajun Tang
Jinchao Xia
Xinzhi Mu
Bo Pang
Cewu Lu
65
120
0
16 Apr 2020
X3D: Expanding Architectures for Efficient Video Recognition
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
134
1,020
0
09 Apr 2020
You Only Watch Once: A Unified CNN Architecture for Real-Time
  Spatiotemporal Action Localization
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
Okan Kopuklu
Xiangyu Wei
Gerhard Rigoll
79
144
0
15 Nov 2019
Slow Feature Analysis for Human Action Recognition
Slow Feature Analysis for Human Action Recognition
Zhang Zhang
Dacheng Tao
43
315
0
15 Jul 2019
Long-Term Feature Banks for Detailed Video Understanding
Long-Term Feature Banks for Detailed Video Understanding
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
179
480
0
12 Dec 2018
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
166
3,274
0
10 Dec 2018
Video Action Transformer Network
Video Action Transformer Network
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
ViT
131
709
0
06 Dec 2018
AdaFrame: Adaptive Frame Selection for Fast Video Recognition
AdaFrame: Adaptive Frame Selection for Fast Video Recognition
Zuxuan Wu
Caiming Xiong
Chih-Yao Ma
R. Socher
L. Davis
180
199
0
29 Nov 2018
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin
Chuang Gan
Song Han
98
1,691
0
20 Nov 2018
Actor-Centric Relation Network
Actor-Centric Relation Network
Chen Sun
Abhinav Shrivastava
Carl Vondrick
Kevin Patrick Murphy
Rahul Sukthankar
Cordelia Schmid
102
221
0
28 Jul 2018
Compressed Video Action Recognition
Compressed Video Action Recognition
Chao-Yuan Wu
Manzil Zaheer
Hexiang Hu
R. Manmatha
Alex Smola
Philipp Krahenbuhl
143
325
0
02 Dec 2017
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual
  Actions
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
C. Pantofaru
...
G. Toderici
Susanna Ricco
Rahul Sukthankar
Cordelia Schmid
Jitendra Malik
VGen
107
1,030
0
23 May 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
352
27,195
0
20 Mar 2017
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Nayeon Lee
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
477
22,108
0
09 Dec 2016
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIPVGen
155
6,162
0
03 Dec 2012
1