ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.11227
  4. Cited By
Multiscale Vision Transformers

Multiscale Vision Transformers

22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
    ViT
ArXivPDFHTML

Papers citing "Multiscale Vision Transformers"

50 / 736 papers shown
Title
GroupMamba: Efficient Group-Based Visual State Space Model
GroupMamba: Efficient Group-Based Visual State Space Model
Abdelrahman M. Shaker
Syed Talal Wasim
Salman Khan
Juergen Gall
Fahad Shahbaz Khan
Mamba
56
0
0
18 Jul 2024
Towards AI-Powered Video Assistant Referee System (VARS) for Association
  Football
Towards AI-Powered Video Assistant Referee System (VARS) for Association Football
Jan Held
A. Cioppa
Silvio Giancola
Abdullah Hamdi
Christel Devue
Bernard Ghanem
Marc Van Droogenbroeck
37
4
0
17 Jul 2024
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an
  Efficient Alternative to Attention in ViTs
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
Yunling Zheng
Zeyi Xu
Fanghui Xue
Biao Yang
Jiancheng Lyu
Shuai Zhang
Y. Qi
Jack Xin
48
0
0
16 Jul 2024
Human-Centric Transformer for Domain Adaptive Action Recognition
Human-Centric Transformer for Domain Adaptive Action Recognition
Kun-Yu Lin
Jiaming Zhou
Wei-Shi Zheng
26
6
0
15 Jul 2024
Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free
  Continual Learning
Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning
Xinyuan Gao
Songlin Dong
Yuhang He
Qiang Wang
Yihong Gong
CLL
24
13
0
14 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
39
7
0
11 Jul 2024
C2C: Component-to-Composition Learning for Zero-Shot Compositional
  Action Recognition
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li
Zhenhua Feng
Tianyang Xu
Linze Li
Xiao-Jun Wu
Muhammad Awais
Sara Atito
Josef Kittler
CoGe
52
5
0
08 Jul 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video
  Action Recognition
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
44
4
0
21 Jun 2024
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to
  Remote Physiological Measurement
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement
Hao Wang
E. Ahn
Jinman Kim
40
0
0
19 Jun 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
36
0
0
11 Jun 2024
Video-based Exercise Classification and Activated Muscle Group
  Prediction with Hybrid X3D-SlowFast Network
Video-based Exercise Classification and Activated Muscle Group Prediction with Hybrid X3D-SlowFast Network
Manvik Pasula
Pramit Saha
18
0
0
10 Jun 2024
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a
  Hybrid Model
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model
Khaled Alomar
Halil Ibrahim Aysel
Xiaohao Cai
MedIm
ViT
37
7
0
02 Jun 2024
Use of a Multiscale Vision Transformer to predict Nursing Activities
  Score from Low Resolution Thermal Videos in an Intensive Care Unit
Use of a Multiscale Vision Transformer to predict Nursing Activities Score from Low Resolution Thermal Videos in an Intensive Care Unit
Isaac YL Lee
Thanh Nguyen-Duc
Ryo Ueno
Jesse Smith
P. Chan
18
0
0
30 May 2024
Visualizing the loss landscape of Self-supervised Vision Transformer
Visualizing the loss landscape of Self-supervised Vision Transformer
Youngwan Lee
Jeffrey Willette
Jonghee Kim
Sung Ju Hwang
ViT
35
1
0
28 May 2024
Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to
  Biological Motion Perception
Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception
Shuangpeng Han
Ziyu Wang
Mengmi Zhang
26
0
0
26 May 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video
  Representation Learning
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Alan L. Yuille
Cihang Xie
AI4TS
VGen
SSL
53
1
0
24 May 2024
Segformer++: Efficient Token-Merging Strategies for High-Resolution
  Semantic Segmentation
Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation
Daniel Kienzle
Marco Kantonis
Robin Schon
Rainer Lienhart
33
2
0
23 May 2024
Counterfactual Gradients-based Quantification of Prediction Trust in
  Neural Networks
Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks
M. Prabhushankar
Ghassan AlRegib
UQCV
27
0
0
22 May 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A
  Survey
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
35
9
0
22 May 2024
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
Zhifan Wan
Jie M. Zhang
Chang-bo Li
Shiguang Shan
69
0
0
21 May 2024
"Previously on ..." From Recaps to Story Summarization
"Previously on ..." From Recaps to Story Summarization
Aditya Kumar Singh
Dhruv Srivastava
Makarand Tapaswi
42
0
0
19 May 2024
GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic
  Hand Gesture Recognition
GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
SLR
ViT
36
2
0
18 May 2024
Towards Gradient-based Time-Series Explanations through a SpatioTemporal
  Attention Network
Towards Gradient-based Time-Series Explanations through a SpatioTemporal Attention Network
Min Hun Lee
AI4TS
ViT
FAtt
24
3
0
18 May 2024
Generative Artificial Intelligence: A Systematic Review and Applications
Generative Artificial Intelligence: A Systematic Review and Applications
S. S. Sengar
Affan Bin Hasan
Sanjay Kumar
Fiona Carroll
MedIm
28
51
0
17 May 2024
Open-Vocabulary Spatio-Temporal Action Detection
Open-Vocabulary Spatio-Temporal Action Detection
Tao Wu
Shuqiu Ge
Jie Qin
Gangshan Wu
Limin Wang
ObjD
23
5
0
17 May 2024
Open-Vocabulary Object Detection via Neighboring Region Attention
  Alignment
Open-Vocabulary Object Detection via Neighboring Region Attention Alignment
Sunyuan Qiang
Xianfei Li
Yanyan Liang
Wenlong Liao
Tao He
Pai Peng
ObjD
27
0
0
14 May 2024
A Semantic and Motion-Aware Spatiotemporal Transformer Network for
  Action Detection
A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection
Matthew Korban
Peter Youngs
Scott T. Acton
ViT
27
6
0
13 May 2024
Deep video representation learning: a survey
Deep video representation learning: a survey
Elham Ravanbakhsh
Yongqing Liang
J. Ramanujam
Xin Li
49
3
0
10 May 2024
A Survey on Backbones for Deep Video Action Recognition
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
28
1
0
09 May 2024
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic
  Activity Recognition
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition
Meiqi Cao
Rui Yan
Xiangbo Shu
Guangzhao Dai
Yazhou Yao
Guo-Sen Xie
36
0
0
04 May 2024
Multi-view Action Recognition via Directed Gromov-Wasserstein
  Discrepancy
Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy
Hoang-Quan Nguyen
Thanh-Dat Truong
Khoa Luu
34
1
0
02 May 2024
Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable
Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable
Haozhe Liu
Wentian Zhang
Bing Li
Bernard Ghanem
Jürgen Schmidhuber
DiffM
WIGM
AAML
28
1
0
01 May 2024
A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation
A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation
Xin Zhang
Liangxiu Han
Tam Sobeih
Lianghao Han
Darren Dancey
50
1
0
26 Apr 2024
UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning
UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning
Maoxun Yuan
Bo Cui
Tianyi Zhao
Xingxing Wei
Shan Fu
Xue Yang
Xingxing Wei
35
0
0
26 Apr 2024
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
Jiaxin Zhuang
Linshan Wu
Qiong Wang
V. Vardhanabhuti
Lin Luo
Hao Chen
Hao Chen
57
4
0
24 Apr 2024
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature
  Processing
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing
Yuang Liu
Zhiheng Qiu
Xiaokai Qin
ViT
31
0
0
20 Apr 2024
STAT: Towards Generalizable Temporal Action Localization
STAT: Towards Generalizable Temporal Action Localization
Yangcen Liu
Ziyi Liu
Yuanhao Zhai
Wen Li
David Doerman
Junsong Yuan
29
2
0
20 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision
  Transformers via Masked Image Modeling Pre-Training
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
41
1
0
18 Apr 2024
Multilateral Temporal-view Pyramid Transformer for Video Inpainting
  Detection
Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
Ying Zhang
Yuezun Li
Bo Peng
Jiaran Zhou
Huiyu Zhou
Junyu Dong
40
0
0
17 Apr 2024
Unifying Global and Local Scene Entities Modelling for Precise Action
  Spotting
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting
Kim Hoang Tran
Phuc Vuong Do
Ngoc Quoc Ly
Ngan Le
36
4
0
15 Apr 2024
STMixer: A One-Stage Sparse Action Detector
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
22
0
0
15 Apr 2024
Multimodal Attack Detection for Action Recognition Models
Multimodal Attack Detection for Action Recognition Models
Furkan Mumcu
Yasin Yılmaz
AAML
31
1
0
13 Apr 2024
Improving Continuous Sign Language Recognition with Adapted Image Models
Improving Continuous Sign Language Recognition with Adapted Image Models
Lianyu Hu
Tongkai Shi
Liqing Gao
Zekang Liu
Wei Feng
VLM
20
5
0
12 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
  Understanding
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
81
88
0
08 Apr 2024
Dual-Scale Transformer for Large-Scale Single-Pixel Imaging
Dual-Scale Transformer for Large-Scale Single-Pixel Imaging
Gang Qu
Ping Wang
Xin Yuan
MedIm
24
1
0
07 Apr 2024
X-VARS: Introducing Explainability in Football Refereeing with
  Multi-Modal Large Language Model
X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model
Jan Held
Hani Itani
A. Cioppa
Silvio Giancola
Bernard Ghanem
Marc Van Droogenbroeck
33
16
0
07 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports
  Videos
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
44
3
0
06 Apr 2024
Koala: Key frame-conditioned long video-LLM
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
Learning Correlation Structures for Vision Transformers
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
24
7
0
05 Apr 2024
Visual Concept Connectome (VCC): Open World Concept Discovery and their
  Interlayer Connections in Deep Models
Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models
M. Kowal
Richard P. Wildes
Konstantinos G. Derpanis
GNN
30
8
0
02 Apr 2024
Previous
123456...131415
Next