ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09883
  4. Cited By
Swin Transformer V2: Scaling Up Capacity and Resolution

Swin Transformer V2: Scaling Up Capacity and Resolution

18 November 2021
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
Yixuan Wei
Jia Ning
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
    ViT
ArXivPDFHTML

Papers citing "Swin Transformer V2: Scaling Up Capacity and Resolution"

50 / 823 papers shown
Title
Enhancing Community Vision Screening -- AI Driven Retinal Photography
  for Early Disease Detection and Patient Trust
Enhancing Community Vision Screening -- AI Driven Retinal Photography for Early Disease Detection and Patient Trust
Xiaofeng Lei
Yih-Chung Tham
Jocelyn Hui Lin Goh
Yangqin Feng
Yang Bai
Z. Soh
Rick Siow Mong Goh
Xinxing Xu
Yong Liu
Ching-Yu Cheng
16
0
0
27 Oct 2024
PESFormer: Boosting Macro- and Micro-expression Spotting with Direct
  Timestamp Encoding
PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding
Wang-Wang Yu
Kai-Fu Yang
Xiangrui Hu
Jingwen Jiang
Hong-Mei Yan
Yong-Jie Li
29
0
0
24 Oct 2024
FIPER: Generalizable Factorized Features for Robust Low-Level Vision Models
FIPER: Generalizable Factorized Features for Robust Low-Level Vision Models
Yang-Che Sun
Cheng Yu Yeo
Ernie Chu
Jun-Cheng Chen
Yu-Lun Liu
SupR
32
0
0
23 Oct 2024
LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices
LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices
Chuntao Ding
Xu Cao
Jianhang Xie
Linlin Fan
Shangguang Wang
Zhichao Lu
39
1
0
22 Oct 2024
Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost
Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost
Cheng-Han Yeh
Kuanchun Yu
Chun-Shien Lu
DiffM
AAML
38
0
0
22 Oct 2024
Are Large-scale Soft Labels Necessary for Large-scale Dataset
  Distillation?
Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?
Lingao Xiao
Yang He
DD
32
5
0
21 Oct 2024
D-SarcNet: A Dual-stream Deep Learning Framework for Automatic Analysis
  of Sarcomere Structures in Fluorescently Labeled hiPSC-CMs
D-SarcNet: A Dual-stream Deep Learning Framework for Automatic Analysis of Sarcomere Structures in Fluorescently Labeled hiPSC-CMs
Huyen Le
Khiet Dang
N. H. Nguyen
Mai Tran
Hieu Pham
21
0
0
19 Oct 2024
Towards Zero-Shot Camera Trap Image Categorization
Towards Zero-Shot Camera Trap Image Categorization
Jiří Vyskočil
Lukas Picek
VLM
26
0
0
16 Oct 2024
Transformer based super-resolution downscaling for regional reanalysis:
  Full domain vs tiling approaches
Transformer based super-resolution downscaling for regional reanalysis: Full domain vs tiling approaches
Antonio Pérez
Mario Santa Cruz
Daniel San Martín
José Manuel Gutiérrez
26
0
0
16 Oct 2024
Hespi: A pipeline for automatically detecting information from hebarium
  specimen sheets
Hespi: A pipeline for automatically detecting information from hebarium specimen sheets
Robert Turnbull
Emily Fitzgerald
Karen Thompson
Joanne L. Birch
20
0
0
11 Oct 2024
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing
  Attention
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention
Nguyen Huu Bao Long
Chenyu Zhang
Yuzhi Shi
Tsubasa Hirakawa
Takayoshi Yamashita
Tohgoroh Matsui
H. Fujiyoshi
36
2
0
11 Oct 2024
HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point
  Cloud Planar Projections
HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar Projections
Jiaxing Hao
Yanxi Wang
Zhigang Chang
Hongmin Gao
Zihao Cheng
Chen Wu
Xin Zhao
Peiye Fang
Rachmat Muwardi
ViT
32
0
0
11 Oct 2024
When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning
When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning
Hao Yan
C. Li
Zhigang Yu
Jun Yin
Ruochen Liu
Peiyan Zhang
Weihao Han
Mingzheng Li
Zhengxin Zeng
34
0
0
11 Oct 2024
IceDiff: High Resolution and High-Quality Sea Ice Forecasting with
  Generative Diffusion Prior
IceDiff: High Resolution and High-Quality Sea Ice Forecasting with Generative Diffusion Prior
Jingyi Xu
Siwei Tu
Weidong Yang
Shuhao Li
Keyi Liu
Yeqi Luo
Lipeng Ma
Ben Fei
Junlin Wu
DiffM
AI4Cl
34
1
0
10 Oct 2024
Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for
  Efficient Banana Plantation Segmentation in UAV Imagery
Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV Imagery
Ang He
Ximei Wu
Xing Xu
Jing Chen
Xiaobin Guo
Sheng Xu
26
0
0
09 Oct 2024
CALoR: Towards Comprehensive Model Inversion Defense
CALoR: Towards Comprehensive Model Inversion Defense
Hongyao Yu
Yixiang Qiu
Hao Fang
Bin Chen
Sijin Yu
Bin Wang
Shu-Tao Xia
Ke Xu
27
1
0
08 Oct 2024
GLRT-Based Metric Learning for Remote Sensing Object Retrieval
GLRT-Based Metric Learning for Remote Sensing Object Retrieval
Linping Zhang
Yu Liu
Xueqian Wang
Gang Li
You He
22
0
0
08 Oct 2024
Guided Self-attention: Find the Generalized Necessarily Distinct Vectors
  for Grain Size Grading
Guided Self-attention: Find the Generalized Necessarily Distinct Vectors for Grain Size Grading
Fang Gao
XueTao Li
Jiabao Wang
Shengheng Ma
Jun Yu
28
0
0
08 Oct 2024
MetaDD: Boosting Dataset Distillation with Neural Network
  Architecture-Invariant Generalization
MetaDD: Boosting Dataset Distillation with Neural Network Architecture-Invariant Generalization
Yunlong Zhao
Xiaoheng Deng
Xiu Su
Hongyan Xu
Xiuxing Li
Yijing Liu
Shan You
FedML
DD
33
1
0
07 Oct 2024
Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations
  at Test-Time
Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time
Chiao-An Yang
Ziwei Liu
Raymond A. Yeh
33
1
0
01 Oct 2024
CBAM-SwinT-BL: Small Rail Surface Defect Detection Method Based on Swin
  Transformer with Block Level CBAM Enhancement
CBAM-SwinT-BL: Small Rail Surface Defect Detection Method Based on Swin Transformer with Block Level CBAM Enhancement
Jiayi Zhao
Alison Wun-lam Yeung
Ali Muhammad
Songjiang Lai
Vincent To-Yee NG
19
2
0
30 Sep 2024
Universal Medical Image Representation Learning with Compositional
  Decoders
Universal Medical Image Representation Learning with Compositional Decoders
Kaini Wang
Ling Yang
Siping Zhou
Guangquan Zhou
Wentao Zhang
Bin Cui
Shuo Li
SSL
MedIm
36
0
0
30 Sep 2024
All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path
  Aggregation
All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation
Xu Zhang
Peiyao Guo
Ming-Tse Lu
Zhan Ma
43
2
0
29 Sep 2024
Exploring Token Pruning in Vision State Space Models
Exploring Token Pruning in Vision State Space Models
Zheng Zhan
Zhenglun Kong
Yifan Gong
Yushu Wu
Zichong Meng
...
Xuan Shen
Stratis Ioannidis
Wei Niu
Pu Zhao
Yanzhi Wang
32
9
0
27 Sep 2024
Cottention: Linear Transformers With Cosine Attention
Cottention: Linear Transformers With Cosine Attention
Gabriel Mongaras
Trevor Dohm
Eric C. Larson
26
0
0
27 Sep 2024
HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting
HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting
Nian Ran
Peng Xiao
Yue Wang
Wesley Shi
Jianxin Lin
Qi Meng
Richard Allmendinger
AI4Cl
44
0
0
27 Sep 2024
MALPOLON: A Framework for Deep Species Distribution Modeling
MALPOLON: A Framework for Deep Species Distribution Modeling
Théo Larcher
Lukás Picek
Benjamin Deneu
Titouan Lorieul
Maximilien Servajean
Alexis Joly
GP
16
0
0
26 Sep 2024
HydraViT: Stacking Heads for a Scalable ViT
HydraViT: Stacking Heads for a Scalable ViT
Janek Haberer
A. Hojjat
Olaf Landsiedel
26
0
0
26 Sep 2024
TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign Recognition
TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign Recognition
Guoyang Zhao
Fulong Ma
Weiqing Qi
Chenguang Zhang
Yuxuan Liu
Ming Liu
Jun Ma
VLM
CLIP
117
3
0
23 Sep 2024
Fake It till You Make It: Curricular Dynamic Forgery Augmentations
  towards General Deepfake Detection
Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection
Yuzhen Lin
Wentang Song
Bin Li
Yuezun Li
Jiangqun Ni
Han Chen
Qiushi Li
34
13
0
22 Sep 2024
Multi-OCT-SelfNet: Integrating Self-Supervised Learning with
  Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease
  Classification
Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification
Fatema Jannat
Sina Gholami
Jennifer I. Lim
Theodore Leng
Minhaj Nur Alam
Hamed Tabkhi
33
0
0
17 Sep 2024
InfoDisent: Explainability of Image Classification Models by Information Disentanglement
InfoDisent: Explainability of Image Classification Models by Information Disentanglement
Łukasz Struski
Dawid Rymarczyk
Jacek Tabor
59
0
0
16 Sep 2024
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion
Vitor Campagnolo Guizilini
P. Tokmakov
Achal Dave
Rares Ambrus
DiffM
28
2
0
15 Sep 2024
LACOSTE: Exploiting stereo and temporal contexts for surgical instrument
  segmentation
LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation
Qiyuan Wang
Shang Zhao
Zikang Xu
S Kevin Zhou
31
0
0
14 Sep 2024
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion
  Preimage
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage
Denis Zavadski
Damjan Kalšan
Carsten Rother
DiffM
MDE
25
5
0
13 Sep 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
45
1
0
12 Sep 2024
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language
  Models on a Single GPU
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU
Zhenyu Ning
Jieru Zhao
Qihao Jin
Wenchao Ding
Minyi Guo
29
5
0
11 Sep 2024
EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation
EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation
Nischal Khanal
Shivanand Venkanna Sheshappanavar
MDE
42
0
0
10 Sep 2024
Renormalized Connection for Scale-preferred Object Detection in
  Satellite Imagery
Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery
Fan Zhang
Lingling Li
Licheng Jiao
Xu Liu
Fang Liu
Shuyuan Yang
B. Hou
ObjD
42
0
0
09 Sep 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViT
VLM
34
4
0
06 Sep 2024
SDformerFlow: Spatiotemporal swin spikeformer for event-based optical
  flow estimation
SDformerFlow: Spatiotemporal swin spikeformer for event-based optical flow estimation
Yi Tian
Juan Andrade-Cetto
32
0
0
06 Sep 2024
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation
Hayeon Jo
Hyesong Choi
Minhee Cho
Dongbo Min
41
1
0
04 Sep 2024
Think Twice Before Recognizing: Large Multimodal Models for General
  Fine-grained Traffic Sign Recognition
Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition
Yaozong Gan
Guang Li
Ren Togo
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
46
1
0
03 Sep 2024
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution
  Image Classification and Semantic Segmentation
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation
Alberto Bacchin
Davide Allegro
Stefano Ghidoni
Emanuele Menegatti
56
1
0
02 Sep 2024
A Simple and Generalist Approach for Panoptic Segmentation
A Simple and Generalist Approach for Panoptic Segmentation
Nedyalko Prisadnikov
Wouter Van Gansbeke
Danda Pani Paudel
Luc Van Gool
VLM
48
0
0
29 Aug 2024
A Review of Transformer-Based Models for Computer Vision Tasks:
  Capturing Global Context and Spatial Relationships
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships
Gracile Astlin Pereira
Muhammad Hussain
ViT
37
7
0
27 Aug 2024
Sapiens: Foundation for Human Vision Models
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Shunsuke Saito
VLM
43
63
0
22 Aug 2024
HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image
  Segmentation
HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation
Mingya Zhang
Zhihao Chen
Yiyuan Ge
Xianping Tao
Mamba
63
3
0
21 Aug 2024
MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial
  Purification
MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification
Huafeng Qin
Yuming Fu
Huiyan Zhang
M. El-Yacoubi
Xinbo Gao
Qun Song
Jun Wang
GAN
AAML
23
0
0
20 Aug 2024
Flatten: Video Action Recognition is an Image Classification task
Flatten: Video Action Recognition is an Image Classification task
Junlin Chen
Chengcheng Xu
Yangfan Xu
Jian Yang
Jun Yu Li
Zhiping Shi
39
1
0
17 Aug 2024
Previous
123456...151617
Next