ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.03902
  4. Cited By
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

7 December 2021
Rui Dai
Srijan Das
Kumara Kahatapitiya
Michael S. Ryoo
Francois Bremond
    ViT
ArXivPDFHTML

Papers citing "MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection"

47 / 47 papers shown
Title
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Mamba
93
3
0
10 Jan 2025
MMAD: Multi-label Micro-Action Detection in Videos
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li
Pengyu Liu
Pengyu Liu
Guoliang Chen
Zhiliang Wu
Hehe Fan
Meng Wang
80
7
0
07 Jul 2024
SWAT: Spatial Structure Within and Among Tokens
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
41
6
0
26 Nov 2021
With a Little Help from my Temporal Context: Multimodal Egocentric
  Action Recognition
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition
Evangelos Kazakos
Jaesung Huh
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
70
45
0
01 Nov 2021
CTRN: Class-Temporal Relational Network for Action Detection
CTRN: Class-Temporal Relational Network for Action Detection
Rui Dai
Srijan Das
Francois Bremond
ViT
41
22
0
26 Oct 2021
Learning an Augmented RGB Representation with Cross-Modal Knowledge
  Distillation for Action Detection
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection
Rui Dai
Srijan Das
Francois Bremond
65
39
0
08 Aug 2021
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Bowen Cheng
Alex Schwing
Alexander Kirillov
VLM
ViT
137
1,517
0
13 Jul 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
83
1,634
0
25 Jun 2021
Video Swin Transformer
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
80
1,458
0
24 Jun 2021
SegFormer: Simple and Efficient Design for Semantic Segmentation with
  Transformers
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Enze Xie
Wenhai Wang
Zhiding Yu
Anima Anandkumar
J. Álvarez
Ping Luo
ViT
148
4,934
0
31 May 2021
Temporal Query Networks for Fine-grained Video Understanding
Temporal Query Networks for Fine-grained Video Understanding
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
62
83
0
19 Apr 2021
CvT: Introducing Convolutions to Vision Transformers
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
111
1,891
0
29 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
137
2,119
0
29 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
319
21,175
0
25 Mar 2021
DeepViT: Towards Deeper Vision Transformer
DeepViT: Towards Deeper Vision Transformer
Daquan Zhou
Bingyi Kang
Xiaojie Jin
Linjie Yang
Xiaochen Lian
Zihang Jiang
Qibin Hou
Jiashi Feng
ViT
63
517
0
22 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive
  Biases
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
92
818
0
19 Mar 2021
Modeling Multi-Label Action Dependencies for Temporal Action
  Localization
Modeling Multi-Label Action Dependencies for Temporal Action Localization
Praveen Tirupattur
Kevin Duarte
Yogesh S Rawat
M. Shah
39
56
0
04 Mar 2021
Coarse-Fine Networks for Temporal Activity Detection in Videos
Coarse-Fine Networks for Temporal Activity Detection in Videos
Kumara Kahatapitiya
Michael S. Ryoo
AI4TS
67
38
0
01 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
450
3,678
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
327
2,016
0
09 Feb 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan
Jiaqi Tang
Limin Wang
Gangshan Wu
ViT
90
178
0
03 Feb 2021
Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity
  Detection
Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection
Rui Dai
Srijan Das
Saurav Sharma
Luca Minciullo
Lorenzo Garattoni
Francois Bremond
Gianpiero Francesca
39
50
0
28 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
164
4,993
0
08 Oct 2020
Attentional Feature Fusion
Attentional Feature Fusion
Yimian Dai
Fabian Gieseke
Stefan Oehmcke
Yiquan Wu
Kobus Barnard
3DPC
37
634
0
29 Sep 2020
X3D: Expanding Architectures for Efficient Video Recognition
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
112
1,013
0
09 Apr 2020
How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?
Md. Amirul Islam
Sen Jia
Neil D. B. Bruce
SSL
235
346
0
22 Jan 2020
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action Detection
Mengmeng Xu
Chen Zhao
D. Rojas
Ali K. Thabet
Guohao Li
95
436
0
26 Nov 2019
Objects as Points
Objects as Points
Xingyi Zhou
Dequan Wang
Philipp Krahenbuhl
3DPC
91
3,240
0
16 Apr 2019
TAN: Temporal Aggregation Network for Dense Multi-label Action
  Recognition
TAN: Temporal Aggregation Network for Dense Multi-label Action Recognition
Xiyang Dai
Bharat Singh
Joe Yue-Hei Ng
L. Davis
ViT
53
25
0
14 Dec 2018
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
144
3,244
0
10 Dec 2018
A Proposal-Based Solution to Spatio-Temporal Action Detection in
  Untrimmed Videos
A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos
Joshua Gleason
Rajeev Ranjan
S. Schwarcz
Carlos D. Castillo
Jun-Cheng Chen
Rama Chellappa
97
40
0
20 Nov 2018
Diagnosing Error in Temporal Action Detectors
Diagnosing Error in Temporal Action Detectors
Humam Alwassel
Fabian Caba Heilbron
Victor Escorcia
Guohao Li
135
106
0
27 Jul 2018
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
72
1,011
0
08 Apr 2018
Temporal Gaussian Mixture Layer for Videos
Temporal Gaussian Mixture Layer for Videos
A. Piergiovanni
Michael S. Ryoo
70
86
0
16 Mar 2018
Path Aggregation Network for Instance Segmentation
Path Aggregation Network for Instance Segmentation
Shu Liu
Lu Qi
Haifang Qin
Jianping Shi
Jiaya Jia
ISeg
SSeg
86
5,696
0
05 Mar 2018
Learning Latent Super-Events to Detect Multiple Activities in Videos
Learning Latent Super-Events to Detect Multiple Activities in Videos
A. Piergiovanni
Michael S. Ryoo
36
90
0
05 Dec 2017
Single Shot Temporal Action Detection
Single Shot Temporal Action Detection
Tianwei Lin
Xu Zhao
Zheng Shou
57
451
0
17 Oct 2017
Focal Loss for Dense Object Detection
Focal Loss for Dense Object Detection
Nayeon Lee
Priya Goyal
Ross B. Girshick
Kaiming He
Piotr Dollár
ObjD
102
2,993
0
07 Aug 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
443
129,831
0
12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
199
7,961
0
22 May 2017
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
Huijuan Xu
Abir Das
Kate Saenko
3DPC
113
714
0
22 Mar 2017
Asynchronous Temporal Fields for Action Recognition
Asynchronous Temporal Fields for Action Recognition
Gunnar Sigurdsson
S. Divvala
Ali Farhadi
Abhinav Gupta
BDL
62
170
0
19 Dec 2016
Temporal Convolutional Networks for Action Segmentation and Detection
Temporal Convolutional Networks for Action Segmentation and Detection
Colin S. Lea
Michael D. Flynn
René Vidal
A. Reiter
Gregory Hager
86
1,478
0
16 Nov 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity
  Understanding
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
77
1,238
0
06 Apr 2016
Every Moment Counts: Dense Detailed Labeling of Actions in Complex
  Videos
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
Serena Yeung
Olga Russakovsky
Ning Jin
Mykhaylo Andriluka
Greg Mori
Li Fei-Fei
VLM
67
438
0
21 Jul 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
813
149,474
0
22 Dec 2014
Large-scale Multi-label Text Classification - Revisiting Neural Networks
Large-scale Multi-label Text Classification - Revisiting Neural Networks
Jinseok Nam
Jungi Kim
E. Mencía
Iryna Gurevych
Johannes Furnkranz
62
362
0
19 Dec 2013
1