Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.03902
Cited By
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
7 December 2021
Rui Dai
Srijan Das
Kumara Kahatapitiya
Michael S. Ryoo
Francois Bremond
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection"
47 / 47 papers shown
Title
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Mamba
93
3
0
10 Jan 2025
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li
Pengyu Liu
Pengyu Liu
Guoliang Chen
Zhiliang Wu
Hehe Fan
Meng Wang
80
7
0
07 Jul 2024
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
41
6
0
26 Nov 2021
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition
Evangelos Kazakos
Jaesung Huh
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
70
45
0
01 Nov 2021
CTRN: Class-Temporal Relational Network for Action Detection
Rui Dai
Srijan Das
Francois Bremond
ViT
41
22
0
26 Oct 2021
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection
Rui Dai
Srijan Das
Francois Bremond
65
39
0
08 Aug 2021
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Bowen Cheng
Alex Schwing
Alexander Kirillov
VLM
ViT
137
1,517
0
13 Jul 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
83
1,634
0
25 Jun 2021
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
80
1,458
0
24 Jun 2021
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Enze Xie
Wenhai Wang
Zhiding Yu
Anima Anandkumar
J. Álvarez
Ping Luo
ViT
148
4,934
0
31 May 2021
Temporal Query Networks for Fine-grained Video Understanding
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
62
83
0
19 Apr 2021
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
111
1,891
0
29 Mar 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
137
2,119
0
29 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
319
21,175
0
25 Mar 2021
DeepViT: Towards Deeper Vision Transformer
Daquan Zhou
Bingyi Kang
Xiaojie Jin
Linjie Yang
Xiaochen Lian
Zihang Jiang
Qibin Hou
Jiashi Feng
ViT
63
517
0
22 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
92
818
0
19 Mar 2021
Modeling Multi-Label Action Dependencies for Temporal Action Localization
Praveen Tirupattur
Kevin Duarte
Yogesh S Rawat
M. Shah
39
56
0
04 Mar 2021
Coarse-Fine Networks for Temporal Activity Detection in Videos
Kumara Kahatapitiya
Michael S. Ryoo
AI4TS
67
38
0
01 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
450
3,678
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
327
2,016
0
09 Feb 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan
Jiaqi Tang
Limin Wang
Gangshan Wu
ViT
90
178
0
03 Feb 2021
Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection
Rui Dai
Srijan Das
Saurav Sharma
Luca Minciullo
Lorenzo Garattoni
Francois Bremond
Gianpiero Francesca
39
50
0
28 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
164
4,993
0
08 Oct 2020
Attentional Feature Fusion
Yimian Dai
Fabian Gieseke
Stefan Oehmcke
Yiquan Wu
Kobus Barnard
3DPC
37
634
0
29 Sep 2020
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
112
1,013
0
09 Apr 2020
How Much Position Information Do Convolutional Neural Networks Encode?
Md. Amirul Islam
Sen Jia
Neil D. B. Bruce
SSL
235
346
0
22 Jan 2020
G-TAD: Sub-Graph Localization for Temporal Action Detection
Mengmeng Xu
Chen Zhao
D. Rojas
Ali K. Thabet
Guohao Li
95
436
0
26 Nov 2019
Objects as Points
Xingyi Zhou
Dequan Wang
Philipp Krahenbuhl
3DPC
91
3,240
0
16 Apr 2019
TAN: Temporal Aggregation Network for Dense Multi-label Action Recognition
Xiyang Dai
Bharat Singh
Joe Yue-Hei Ng
L. Davis
ViT
53
25
0
14 Dec 2018
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
144
3,244
0
10 Dec 2018
A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos
Joshua Gleason
Rajeev Ranjan
S. Schwarcz
Carlos D. Castillo
Jun-Cheng Chen
Rama Chellappa
97
40
0
20 Nov 2018
Diagnosing Error in Temporal Action Detectors
Humam Alwassel
Fabian Caba Heilbron
Victor Escorcia
Guohao Li
135
106
0
27 Jul 2018
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
72
1,011
0
08 Apr 2018
Temporal Gaussian Mixture Layer for Videos
A. Piergiovanni
Michael S. Ryoo
70
86
0
16 Mar 2018
Path Aggregation Network for Instance Segmentation
Shu Liu
Lu Qi
Haifang Qin
Jianping Shi
Jiaya Jia
ISeg
SSeg
86
5,696
0
05 Mar 2018
Learning Latent Super-Events to Detect Multiple Activities in Videos
A. Piergiovanni
Michael S. Ryoo
36
90
0
05 Dec 2017
Single Shot Temporal Action Detection
Tianwei Lin
Xu Zhao
Zheng Shou
57
451
0
17 Oct 2017
Focal Loss for Dense Object Detection
Nayeon Lee
Priya Goyal
Ross B. Girshick
Kaiming He
Piotr Dollár
ObjD
102
2,993
0
07 Aug 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
443
129,831
0
12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
199
7,961
0
22 May 2017
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
Huijuan Xu
Abir Das
Kate Saenko
3DPC
113
714
0
22 Mar 2017
Asynchronous Temporal Fields for Action Recognition
Gunnar Sigurdsson
S. Divvala
Ali Farhadi
Abhinav Gupta
BDL
62
170
0
19 Dec 2016
Temporal Convolutional Networks for Action Segmentation and Detection
Colin S. Lea
Michael D. Flynn
René Vidal
A. Reiter
Gregory Hager
86
1,478
0
16 Nov 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
77
1,238
0
06 Apr 2016
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
Serena Yeung
Olga Russakovsky
Ning Jin
Mykhaylo Andriluka
Greg Mori
Li Fei-Fei
VLM
67
438
0
21 Jul 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
813
149,474
0
22 Dec 2014
Large-scale Multi-label Text Classification - Revisiting Neural Networks
Jinseok Nam
Jungi Kim
E. Mencía
Iryna Gurevych
Johannes Furnkranz
62
362
0
19 Dec 2013
1