ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09883
  4. Cited By
Swin Transformer V2: Scaling Up Capacity and Resolution
v1v2 (latest)

Swin Transformer V2: Scaling Up Capacity and Resolution

18 November 2021
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
Yixuan Wei
Jia Ning
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
    ViT
ArXiv (abs)PDFHTMLGithub (14834★)

Papers citing "Swin Transformer V2: Scaling Up Capacity and Resolution"

50 / 840 papers shown
Title
InfoDisent: Explainability of Image Classification Models by Information Disentanglement
InfoDisent: Explainability of Image Classification Models by Information Disentanglement
Łukasz Struski
Dawid Rymarczyk
Jacek Tabor
118
1
0
16 Sep 2024
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
Vitor Campagnolo Guizilini
P. Tokmakov
Achal Dave
Rares Andrei Ambrus
DiffM
73
2
0
15 Sep 2024
LACOSTE: Exploiting stereo and temporal contexts for surgical instrument
  segmentation
LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation
Qiyuan Wang
Shang Zhao
Zikang Xu
S Kevin Zhou
129
0
0
14 Sep 2024
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion
  Preimage
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage
Denis Zavadski
Damjan Kalšan
Carsten Rother
DiffMMDE
73
7
0
13 Sep 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
161
2
0
12 Sep 2024
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language
  Models on a Single GPU
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU
Zhenyu Ning
Jieru Zhao
Qihao Jin
Wenchao Ding
Minyi Guo
38
7
0
11 Sep 2024
EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation
EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation
Nischal Khanal
Shivanand Venkanna Sheshappanavar
MDE
96
0
0
10 Sep 2024
Renormalized Connection for Scale-preferred Object Detection in
  Satellite Imagery
Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery
Fan Zhang
Lingling Li
Licheng Jiao
Xu Liu
Fang Liu
Shuyuan Yang
B. Hou
ObjD
71
0
0
09 Sep 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViTVLM
114
4
0
06 Sep 2024
SDformerFlow: Spatiotemporal swin spikeformer for event-based optical
  flow estimation
SDformerFlow: Spatiotemporal swin spikeformer for event-based optical flow estimation
Yi Tian
Juan Andrade-Cetto
67
1
0
06 Sep 2024
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation
Hayeon Jo
Hyesong Choi
Minhee Cho
Dongbo Min
124
2
0
04 Sep 2024
Think Twice Before Recognizing: Large Multimodal Models for General
  Fine-grained Traffic Sign Recognition
Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition
Yaozong Gan
Guang Li
Ren Togo
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
84
1
0
03 Sep 2024
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution
  Image Classification and Semantic Segmentation
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation
Alberto Bacchin
Davide Allegro
Stefano Ghidoni
Emanuele Menegatti
83
1
0
02 Sep 2024
A Simple and Generalist Approach for Panoptic Segmentation
A Simple and Generalist Approach for Panoptic Segmentation
Nedyalko Prisadnikov
Wouter Van Gansbeke
Danda Pani Paudel
Luc Van Gool
VLM
116
0
0
29 Aug 2024
A Review of Transformer-Based Models for Computer Vision Tasks:
  Capturing Global Context and Spatial Relationships
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships
Gracile Astlin Pereira
Muhammad Hussain
ViT
68
10
0
27 Aug 2024
Sapiens: Foundation for Human Vision Models
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Shunsuke Saito
VLM
135
81
0
22 Aug 2024
HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image
  Segmentation
HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation
Mingya Zhang
Zhihao Chen
Yiyuan Ge
Xianping Tao
Mamba
110
4
0
21 Aug 2024
MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial
  Purification
MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification
Huafeng Qin
Yuming Fu
Huiyan Zhang
M. El-Yacoubi
Xinbo Gao
Qun Song
Jun Wang
GANAAML
94
0
0
20 Aug 2024
Flatten: Video Action Recognition is an Image Classification task
Flatten: Video Action Recognition is an Image Classification task
Junlin Chen
Chengcheng Xu
Yangfan Xu
Jian Yang
Jun Yu Li
Zhiping Shi
70
1
0
17 Aug 2024
Focus on Focus: Focus-oriented Representation Learning and Multi-view
  Cross-modal Alignment for Glioma Grading
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma Grading
Li Pan
Yupei Zhang
Qiushi Yang
Tan Li
Xiaohan Xing
Maximus C. F. Yeung
Zhen Chen
52
1
0
16 Aug 2024
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual
  Recognition Tasks
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks
Dongshuo Yin
Leiyi Hu
Bin Li
Youqun Zhang
Xue Yang
101
10
0
15 Aug 2024
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Ruonan Yu
Songhua Liu
Zigeng Chen
Jingwen Ye
Xinchao Wang
DD
122
2
0
15 Aug 2024
GRFormer: Grouped Residual Self-Attention for Lightweight Single Image
  Super-Resolution
GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-Resolution
Yuzhen Li
Zehang Deng
Yuxin Cao
Lihua Liu
64
2
0
14 Aug 2024
Advanced Vision Transformers and Open-Set Learning for Robust Mosquito
  Classification: A Novel Approach to Entomological Studies
Advanced Vision Transformers and Open-Set Learning for Robust Mosquito Classification: A Novel Approach to Entomological Studies
Ahmed Akib Jawad Karim
Muhammad Zawad Mahmud
Riasat Khan
25
0
0
12 Aug 2024
MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model
MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model
Haoyu Qin
Yungang Chen
Qianchuan Jiang
Pengchao Sun
Xiancai Ye
Chao Lin
MambaAI4CE
71
1
0
12 Aug 2024
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in
  Underperformed Scenes
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
Ke Zhou
Zhongwei Qiu
Dongmei Fu
VLM
70
3
0
12 Aug 2024
Enhancing 3D Transformer Segmentation Model for Medical Image with
  Token-level Representation Learning
Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning
Xinrong Hu
Dewen Zeng
Yawen Wu
Xueyang Li
Yiyu Shi
ViTMedIm
75
0
0
12 Aug 2024
Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images
Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images
Shouyue Liu
Jinkui Hao
Yuanyuan Gu
Huazhu Fu
Xinyu Guo
Shuting Zhang
Yitian Zhao
Hong Song
Shuting Zhang
Yitian Zhao
56
1
0
09 Aug 2024
Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach
Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach
Alireza Saber
Pouria Parhami
Alimihammad Siahkarzadeh
Amirreza Fateh
Amirreza Fateh
ViTMedIm
133
10
0
08 Aug 2024
What Happens Without Background? Constructing Foreground-Only Data for
  Fine-Grained Tasks
What Happens Without Background? Constructing Foreground-Only Data for Fine-Grained Tasks
Yuetian Wang
W. Hou
Qinmu Peng
Xinge You
115
0
0
04 Aug 2024
LAM3D: Leveraging Attention for Monocular 3D Object Detection
LAM3D: Leveraging Attention for Monocular 3D Object Detection
Diana-Alexandra Sas
Leandro Di Bella
Yangxintong Lyu
F. Oniga
Adrian Munteanu
86
1
0
03 Aug 2024
NVC-1B: A Large Neural Video Coding Model
NVC-1B: A Large Neural Video Coding Model
Xihua Sheng
Chuanbo Tang
Li Li
Dong Liu
Feng Wu
3DVVLM
90
3
0
28 Jul 2024
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Zhijian Liu
Zhuoyang Zhang
Samir Khaki
Shang Yang
Haotian Tang
Chenfeng Xu
Kurt Keutzer
Song Han
SSeg
84
1
0
26 Jul 2024
VSSD: Vision Mamba with Non-Causal State Space Duality
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi
Minjing Dong
Mingjia Li
Chang Xu
Mamba
101
3
0
26 Jul 2024
HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from
  Focus and Single-Image Priors
HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors
Ashkan Ganj
Hang Su
Tian Guo
MDE
59
0
0
26 Jul 2024
Towards the Spectral bias Alleviation by Normalizations in Coordinate
  Networks
Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks
Zhicheng Cai
Hao Zhu
Qiu Shen
Xinran Wang
Xun Cao
114
0
0
25 Jul 2024
Embedding-Free Transformer with Inference Spatial Reduction for
  Efficient Semantic Segmentation
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu
Yubin Cho
Beoungwoo Kang
Seunghun Moon
Kyeongbo Kong
Suk-Ju Kang
76
3
0
24 Jul 2024
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting
  Recognition
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition
Gagan Bhatia
El Moatez Billah Nagoudi
Fakhraddin Alwajih
Muhammad Abdul-Mageed
64
4
0
18 Jul 2024
UCIP: A Universal Framework for Compressed Image Super-Resolution using
  Dynamic Prompt
UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
Xin Li
Bingchen Li
Yeying Jin
Cuiling Lan
Hanxin Zhu
Yulin Ren
Zhibo Chen
98
8
0
18 Jul 2024
GroupMamba: Efficient Group-Based Visual State Space Model
GroupMamba: Efficient Group-Based Visual State Space Model
Abdelrahman M. Shaker
Syed Talal Wasim
Salman Khan
Juergen Gall
Fahad Shahbaz Khan
Mamba
93
0
0
18 Jul 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
103
3
0
17 Jul 2024
MapDistill: Boosting Efficient Camera-based HD Map Construction via
  Camera-LiDAR Fusion Model Distillation
MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
Xiaoshuai Hao
Ruikai Li
Hui Zhang
Dingzhe Li
Rong Yin
Sangil Jung
Seungsang Park
ByungIn Yoo
Haimei Zhao
Jing Zhang
82
9
0
16 Jul 2024
Centering the Value of Every Modality: Towards Efficient and Resilient
  Modality-agnostic Semantic Segmentation
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
Xueye Zheng
Yuanhuiyi Lyu
Jiazhou Zhou
Lin Wang
107
13
0
16 Jul 2024
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
Hao Ding
Tuxun Lu
Yuqian Zhang
Ruixing Liang
Hongchao Shu
...
Bo Wang
Marcos Fernández-Rodríguez
Estevao Lima
João L. Vilaça
Mathias Unberath
253
4
0
16 Jul 2024
Backdoor Attacks against Image-to-Image Networks
Backdoor Attacks against Image-to-Image Networks
Wenbo Jiang
Hongwei Li
Jiaming He
Rui Zhang
Guowen Xu
Tianwei Zhang
Rongxing Lu
AAML
69
5
0
15 Jul 2024
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning
  Mamba
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
Haoye Dong
Aviral Chharia
Wenbo Gou
Francisco Vicente Carrasco
Fernando de la Torre
Mamba
96
7
0
12 Jul 2024
Data Adaptive Traceback for Vision-Language Foundation Models in Image
  Classification
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Wenshuo Peng
Kaipeng Zhang
Yue Yang
Hao Zhang
Ping Luo
VLM
75
3
0
11 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
101
4
0
10 Jul 2024
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification
Omar S. El-Assiouti
Ghada Hamed
Dina Khattab
H. M. Ebied
84
3
0
10 Jul 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
154
74
0
10 Jul 2024
Previous
12345...151617
Next