ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.11227
  4. Cited By
Multiscale Vision Transformers

Multiscale Vision Transformers

22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
    ViT
ArXivPDFHTML

Papers citing "Multiscale Vision Transformers"

50 / 736 papers shown
Title
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
Yicheng Song
Tiancheng Lin
Die Peng
Su Yang
Yi Xu
MedIm
31
0
0
10 May 2025
Learning Streaming Video Representation via Multitask Training
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
84
0
0
28 Apr 2025
A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw
A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw
Wenwen Li
Chia-Yu Hsu
Sizhe Wang
Zhining Gu
Yili Yang
Brendan M. Rogers
A. Liljedahl
61
0
0
23 Apr 2025
ECViT: Efficient Convolutional Vision Transformer with Local-Attention and Multi-scale Stages
ECViT: Efficient Convolutional Vision Transformer with Local-Attention and Multi-scale Stages
Zhoujie Qian
ViT
26
0
0
21 Apr 2025
Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches
Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches
Guodong Shen
Yuqi Ouyang
Junru Lu
Yixuan Yang
Victor Sanchez
36
1
0
20 Apr 2025
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion
Y. Wang
J. Li
Chaoyi Hong
Ruibo Li
Liusheng Sun
Xiao-yang Song
Zhe Wang
Zhiguo Cao
Guosheng Lin
MDE
29
0
0
16 Apr 2025
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak Nadjar Araabi
127
0
0
14 Apr 2025
CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation
CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation
Junchen Fu
Yongxin Ni
J. Jose
Ioannis Arapakis
Kaiwen Zheng
Y. Li
Xuri Ge
34
0
0
14 Apr 2025
Vision Transformers Exhibit Human-Like Biases: Evidence of Orientation and Color Selectivity, Categorical Perception, and Phase Transitions
Vision Transformers Exhibit Human-Like Biases: Evidence of Orientation and Color Selectivity, Categorical Perception, and Phase Transitions
Nooshin Bahador
24
0
0
13 Apr 2025
Audio-visual Event Localization on Portrait Mode Short Videos
Audio-visual Event Localization on Portrait Mode Short Videos
Wuyang Liu
Yi Chai
Yongpeng Yan
Yanzhen Ren
21
0
0
09 Apr 2025
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Y. S. Rawat
SSL
189
0
0
08 Apr 2025
MCAT: Visual Query-Based Localization of Standard Anatomical Clips in Fetal Ultrasound Videos Using Multi-Tier Class-Aware Token Transformer
MCAT: Visual Query-Based Localization of Standard Anatomical Clips in Fetal Ultrasound Videos Using Multi-Tier Class-Aware Token Transformer
Divyanshu Mishra
Pramit Saha
He Zhao
Netzahualcoyotl Hernandez-Cruz
Olga Patey
A. Papageorghiou
J. A. Noble
26
0
0
08 Apr 2025
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Hanping Zhang
Yuhong Guo
OffRL
38
0
0
03 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
57
0
0
01 Apr 2025
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Junzhu Mao
Yang Shen
Jinyang Guo
Yazhou Yao
Xiansheng Hua
ViT
36
0
0
30 Mar 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
51
0
0
30 Mar 2025
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung
Frangil Ramirez
Juhyung Ha
Yi-Ting Chen
David J. Crandall
Yi-Hsuan Tsai
43
0
0
27 Mar 2025
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai
Andrea Vedaldi
39
0
0
25 Mar 2025
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
Zihao Chen
Hsuanyu Wu
Chi-Hsi Kung
Yi-Ting Chen
Yan-Tsung Peng
42
0
0
24 Mar 2025
Cost-Sensitive Learning for Long-Tailed Temporal Action Segmentation
Cost-Sensitive Learning for Long-Tailed Temporal Action Segmentation
Zhanzhong Pang
Fadime Sener
Shrinivas Ramasubramanian
Angela Yao
56
1
0
24 Mar 2025
EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis
EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis
Matthew Massey
Abdullah-Al-Zubaer Imran
49
0
0
19 Mar 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
59
0
0
17 Mar 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
50
0
0
17 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
53
0
0
17 Mar 2025
Domain Generalization for Improved Human Activity Recognition in Office Space Videos Using Adaptive Pre-processing
Domain Generalization for Improved Human Activity Recognition in Office Space Videos Using Adaptive Pre-processing
Partho Ghosh
Raisa Bentay Hossain
Mohammad Zunaed
Taufiq Hasan
58
0
0
16 Mar 2025
Spatio-temporal Fourier Transformer (StFT) for Long-term Dynamics Prediction
Spatio-temporal Fourier Transformer (StFT) for Long-term Dynamics Prediction
Da Long
Shandian Zhe
Samuel Williams
L. Oliker
Zhe Bai
AI4TS
AI4CE
44
0
0
14 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Y. Li
Xinggang Wang
LM&Ro
59
0
0
13 Mar 2025
ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration
Mengting Ai
Tianxin Wei
Yifan Chen
Zhichen Zeng
Ritchie Zhao
G. Varatkar
B. Rouhani
Xianfeng Tang
Hanghang Tong
Jingrui He
MoE
47
1
0
10 Mar 2025
VoD: Learning Volume of Differences for Video-Based Deepfake Detection
Ying Xu
Marius Pedersen
Kiran Raja
31
0
0
10 Mar 2025
ScaleFusionNet: Transformer-Guided Multi-Scale Feature Fusion for Skin Lesion Segmentation
ScaleFusionNet: Transformer-Guided Multi-Scale Feature Fusion for Skin Lesion Segmentation
Saqib Qamar
Syed Furqan Qadri
Roobaea Alroobaea
Majed Alsafyani
Abdullah M. Baqasah
ViT
MedIm
89
0
0
05 Mar 2025
Video-DPRP: A Differentially Private Approach for Visual Privacy-Preserving Video Human Activity Recognition
Allassan Tchangmena A Nken
Susan Mckeever
Peter Corcoran
Ihsan Ullah
PICV
48
0
0
03 Mar 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
89
0
0
27 Feb 2025
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding
Luoying Hao
Yan Hu
Yang Yue
Li Wu
Huazhu Fu
Jinming Duan
Jiang Liu
61
0
0
24 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
69
0
0
24 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
42
0
0
11 Feb 2025
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Yunzhi Zhuge
Hongyu Gu
Lu Zhang
Jinqing Qi
Huchuan Lu
VOS
67
2
0
14 Jan 2025
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics
Tze Ho Elden Tse
Runyang Feng
Linfang Zheng
Jiho Park
Yixing Gao
Jihie Kim
A. Leonardis
H. Chang
49
0
0
13 Jan 2025
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Mamba
53
3
0
10 Jan 2025
Causal Deep Learning
Causal Deep Learning
M. Alex O. Vasilescu
CML
54
2
1
03 Jan 2025
Multiscaled Multi-Head Attention-based Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
SLR
30
16
0
03 Jan 2025
Breaking the Context Bottleneck on Long Time Series Forecasting
Breaking the Context Bottleneck on Long Time Series Forecasting
Chao Ma
Yikai Hou
Xiang Li
Yinggang Sun
Haining Yu
Zhou Fang
Jiaxing Qu
AI4TS
69
0
0
21 Dec 2024
ImagePiece: Content-aware Re-tokenization for Efficient Image
  Recognition
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition
Seungdong Yoa
Seungjun Lee
Hyeseung Cho
Bumsoo Kim
Woohyung Lim
ViT
70
0
0
21 Dec 2024
GCA-3D: Towards Generalized and Consistent Domain Adaptation of 3D
  Generators
GCA-3D: Towards Generalized and Consistent Domain Adaptation of 3D Generators
Hengjia Li
Yang Liu
Yibo Zhao
Haoran Cheng
Yang Yang
...
Qibo Qiu
Boxi Wu
Tu Zheng
Zheng Yang
D. Cai
89
0
0
20 Dec 2024
A4-Unet: Deformable Multi-Scale Attention Network for Brain Tumor
  Segmentation
A4-Unet: Deformable Multi-Scale Attention Network for Brain Tumor Segmentation
Ruoxin Wang
Tianyi Tang
Haiming Du
Yuxuan Cheng
Yu Wang
Lingjie Yang
Xiaohui Duan
Yunfang Yu
Yu Zhou
Donglong Chen
54
0
0
08 Dec 2024
Self-Supervised Learning with Probabilistic Density Labeling for
  Rainfall Probability Estimation
Self-Supervised Learning with Probabilistic Density Labeling for Rainfall Probability Estimation
Junha Lee
Sojung An
Sujeong You
Namik Cho
67
0
0
08 Dec 2024
MuSiCNet: A Gradual Coarse-to-Fine Framework for Irregularly Sampled
  Multivariate Time Series Analysis
MuSiCNet: A Gradual Coarse-to-Fine Framework for Irregularly Sampled Multivariate Time Series Analysis
Jiexi Liu
Meng Cao
Songcan Chen
AI4TS
79
0
0
02 Dec 2024
Instance-Aware Graph Prompt Learning
Instance-Aware Graph Prompt Learning
Jiazheng Li
Jundong Li
Chuxu Zhang
VLM
67
2
0
26 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
98
0
0
20 Nov 2024
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
M. Gong
Tongliang Liu
92
6
0
18 Nov 2024
Learning Collective Dynamics of Multi-Agent Systems using Event-based
  Vision
Learning Collective Dynamics of Multi-Agent Systems using Event-based Vision
Minah Lee
Uday Kamal
Saibal Mukhopadhyay
25
0
0
11 Nov 2024
1234...131415
Next