ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09883
  4. Cited By
Swin Transformer V2: Scaling Up Capacity and Resolution

Swin Transformer V2: Scaling Up Capacity and Resolution

18 November 2021
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
Yixuan Wei
Jia Ning
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
    ViT
ArXivPDFHTML

Papers citing "Swin Transformer V2: Scaling Up Capacity and Resolution"

50 / 823 papers shown
Title
Vision Transformers Are Good Mask Auto-Labelers
Vision Transformers Are Good Mask Auto-Labelers
Shiyi Lan
Xitong Yang
Zhiding Yu
Zuxuan Wu
J. Álvarez
Anima Anandkumar
ISeg
ViT
MedIm
24
19
0
10 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
Jia Ning
Chen Li
Zheng-Wei Zhang
Zigang Geng
Qi Dai
Kun He
Han Hu
53
44
0
05 Jan 2023
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
  Distribution
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution
Yan Li
Xin Lu
Haoyi Xiong
Jian Tang
Jian Su
Bo Jin
Dejing Dou
AI4TS
15
24
0
05 Jan 2023
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
Sucheng Ren
Fangyun Wei
Zheng-Wei Zhang
Han Hu
40
34
0
03 Jan 2023
Rethinking Mobile Block for Efficient Attention-based Models
Rethinking Mobile Block for Efficient Attention-based Models
Jiangning Zhang
Xiangtai Li
Jian Li
Liang Liu
Zhucun Xue
Boshen Zhang
Zhe Jiang
Tianxin Huang
Yabiao Wang
Chengjie Wang
MQ
44
90
0
03 Jan 2023
Edge Enhanced Image Style Transfer via Transformers
Edge Enhanced Image Style Transfer via Transformers
Chi Zhang
Jun Yang
Zaiyan Dai
Peng-Xia Cao
16
10
0
02 Jan 2023
Transformers in Action Recognition: A Review on Temporal Modeling
Transformers in Action Recognition: A Review on Temporal Modeling
Elham Shabaninia
Hossein Nezamabadi-pour
Fatemeh Shafizadegan
ViT
24
8
0
29 Dec 2022
Local Learning on Transformers via Feature Reconstruction
Local Learning on Transformers via Feature Reconstruction
P. Pathak
Jingwei Zhang
Dimitris Samaras
ViT
24
5
0
29 Dec 2022
Boosting Out-of-Distribution Detection with Multiple Pre-trained Models
Boosting Out-of-Distribution Detection with Multiple Pre-trained Models
Feng Xue
Zi He
Chuanlong Xie
Falong Tan
Zhenguo Li
OODD
33
7
0
24 Dec 2022
Reversible Column Networks
Reversible Column Networks
Yuxuan Cai
Yi Zhou
Qi Han
Jianjian Sun
Xiangwen Kong
Jun Yu Li
Xiangyu Zhang
VLM
31
53
0
22 Dec 2022
SLGTformer: An Attention-Based Approach to Sign Language Recognition
SLGTformer: An Attention-Based Approach to Sign Language Recognition
Neil Song
Yu Xiang
SLR
30
0
0
21 Dec 2022
Universal Object Detection with Large Vision Model
Universal Object Detection with Large Vision Model
Feng-Huei Lin
Wenze Hu
Yaowei Wang
Yonghong Tian
Guangming Lu
Fanglin Chen
Yong-mei Xu
Xiaoyu Wang
VLM
ObjD
32
8
0
19 Dec 2022
Analysis and application of multispectral data for water segmentation
  using machine learning
Analysis and application of multispectral data for water segmentation using machine learning
Shubham Gupta
D. Uma
R. Hebbar
18
0
0
16 Dec 2022
Attentive Mask CLIP
Attentive Mask CLIP
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
42
27
0
16 Dec 2022
Rethinking Vision Transformers for MobileNet Size and Speed
Rethinking Vision Transformers for MobileNet Size and Speed
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
35
159
0
15 Dec 2022
FlexiViT: One Model for All Patch Sizes
FlexiViT: One Model for All Patch Sizes
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim M. Alabdulmohsin
Filip Pavetić
VLM
45
90
0
15 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
34
73
0
15 Dec 2022
What do Vision Transformers Learn? A Visual Exploration
What do Vision Transformers Learn? A Visual Exploration
Amin Ghiasi
Hamid Kazemi
Eitan Borgnia
Steven Reich
Manli Shu
Micah Goldblum
A. Wilson
Tom Goldstein
ViT
34
60
0
13 Dec 2022
Position Embedding Needs an Independent Layer Normalization
Position Embedding Needs an Independent Layer Normalization
Runyi Yu
Zhennan Wang
Yinhuai Wang
Kehan Li
Yian Zhao
Jian Zhang
Guoli Song
Jie Chen
31
1
0
10 Dec 2022
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive
  Learning
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
Jishnu Mukhoti
Tsung-Yu Lin
Omid Poursaeed
Rui Wang
Ashish Shah
Philip Torr
Ser-Nam Lim
VLM
35
80
0
09 Dec 2022
Spurious Features Everywhere -- Large-Scale Detection of Harmful
  Spurious Features in ImageNet
Spurious Features Everywhere -- Large-Scale Detection of Harmful Spurious Features in ImageNet
Yannic Neuhaus
Maximilian Augustin
Valentyn Boreiko
Matthias Hein
AAML
36
30
0
09 Dec 2022
Mitigation of Spatial Nonstationarity with Vision Transformers
Mitigation of Spatial Nonstationarity with Vision Transformers
Lei Liu
Javier E. Santos
Mavsa Prodanović
Michael J. Pyrcz
14
4
0
09 Dec 2022
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using
  CLIP and StableDiffusion
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion
Hanqing Zhao
Dianmo Sheng
Jianmin Bao
Dongdong Chen
Dong Chen
...
Ce Liu
Wenbo Zhou
Qi Chu
Weiming Zhang
Neng H. Yu
VLM
DiffM
38
39
0
07 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
24
33
0
01 Dec 2022
Finding Differences Between Transformers and ConvNets Using
  Counterfactual Simulation Testing
Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing
Nataniel Ruiz
Sarah Adel Bargal
Cihang Xie
Kate Saenko
Stan Sclaroff
ViT
39
5
0
29 Nov 2022
Transferability Estimation Based On Principal Gradient Expectation
Transferability Estimation Based On Principal Gradient Expectation
Huiyan Qi
Lechao Cheng
Jingjing Chen
Yue Yu
Xue Song
Zunlei Feng
Yueping Jiang
27
2
0
29 Nov 2022
Metal-conscious Embedding for CBCT Projection Inpainting
Metal-conscious Embedding for CBCT Projection Inpainting
F. Fan
Yangkong Wang
L. Ritschl
R. Biniazan
M. Beister
Björn Kreher
Yixing Huang
Steffen Kappler
Andreas Maier
MedIm
8
0
0
29 Nov 2022
Class Adaptive Network Calibration
Class Adaptive Network Calibration
Bingyuan Liu
Jérôme Rony
Adrian Galdran
Jose Dolz
Ismail Ben Ayed
45
8
0
28 Nov 2022
UperFormer: A Multi-scale Transformer-based Decoder for Semantic
  Segmentation
UperFormer: A Multi-scale Transformer-based Decoder for Semantic Segmentation
Jing Xu
W. Shi
Pan Gao
Zhengwei Wang
Qizhu Li
ViT
6
1
0
25 Nov 2022
DETRs with Collaborative Hybrid Assignments Training
DETRs with Collaborative Hybrid Assignments Training
Zhuofan Zong
Guanglu Song
Yu Liu
ViT
57
306
0
22 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
34
129
0
22 Nov 2022
N-Gram in Swin Transformers for Efficient Lightweight Image
  Super-Resolution
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
Haram Choi
Jeong-Sik Lee
Jihoon Yang
ViT
29
75
0
21 Nov 2022
Crowdsensing-based Road Damage Detection Challenge (CRDDC-2022)
Crowdsensing-based Road Damage Detection Challenge (CRDDC-2022)
Deeksha M. Arya
Hiroya Maeda
S. Ghosh
Durga Toshniwal
Hiroshi Omata
Takehiro Kashiyama
Osaka University of Economics
32
40
0
21 Nov 2022
Blind Knowledge Distillation for Robust Image Classification
Blind Knowledge Distillation for Robust Image Classification
Timo Kaiser
Lukas Ehmann
Christoph Reinders
Bodo Rosenhahn
NoLa
13
13
0
21 Nov 2022
EHSNet: End-to-End Holistic Learning Network for Large-Size Remote
  Sensing Image Semantic Segmentation
EHSNet: End-to-End Holistic Learning Network for Large-Size Remote Sensing Image Semantic Segmentation
Wei Chen
Yansheng Li
Bo Dang
Yongjun Zhang
29
3
0
21 Nov 2022
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text
  Spotting
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Maoyuan Ye
Jing Zhang
Shanshan Zhao
Juhua Liu
Tongliang Liu
Bo Du
Dacheng Tao
44
71
0
19 Nov 2022
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
41
14
0
19 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo
  Matching and Optical Flow
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
29
83
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
  Information
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
42
41
0
17 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
87
679
0
14 Nov 2022
Demystify Transformers & Convolutions in Modern Image Deep Networks
Demystify Transformers & Convolutions in Modern Image Deep Networks
Jifeng Dai
Min Shi
Weiyun Wang
Sitong Wu
Linjie Xing
...
Lewei Lu
Jie Zhou
Xiaogang Wang
Yu Qiao
Xiao-hua Hu
ViT
31
11
0
10 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with
  Deformable Convolutions
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
38
660
0
10 Nov 2022
OneFormer: One Transformer to Rule Universal Image Segmentation
OneFormer: One Transformer to Rule Universal Image Segmentation
Jitesh Jain
Jiacheng Li
M. Chiu
Ali Hassani
Nikita Orlov
Humphrey Shi
ViT
31
327
0
10 Nov 2022
StyleNAT: Giving Each Head a New Perspective
StyleNAT: Giving Each Head a New Perspective
Steven Walton
Ali Hassani
Xingqian Xu
Zhangyang Wang
Humphrey Shi
ViT
31
23
0
10 Nov 2022
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Qiang Chen
Jian Wang
Chuchu Han
Shangang Zhang
Zexian Li
...
Haocheng Feng
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
ViT
VLM
42
45
0
07 Nov 2022
Late Fusion with Triplet Margin Objective for Multimodal Ideology
  Prediction and Analysis
Late Fusion with Triplet Margin Objective for Multimodal Ideology Prediction and Analysis
Changyuan Qiu
Winston Wu
Xinliang Frederick Zhang
Lu Wang
22
1
0
04 Nov 2022
Could Giant Pretrained Image Models Extract Universal Representations?
Could Giant Pretrained Image Models Extract Universal Representations?
Yutong Lin
Ze Liu
Zheng-Wei Zhang
Han Hu
Nanning Zheng
Stephen Lin
Yue Cao
VLM
54
9
0
03 Nov 2022
Learning a Condensed Frame for Memory-Efficient Video Class-Incremental
  Learning
Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning
Yixuan Pei
Zhiwu Qing
Jun Cen
Xiang Wang
Shiwei Zhang
Yaxiong Wang
Mingqian Tang
Nong Sang
Xueming Qian
27
13
0
02 Nov 2022
State-of-the-art Models for Object Detection in Various Fields of
  Application
State-of-the-art Models for Object Detection in Various Fields of Application
S. A. G. Naqvi
Syed Shahnawaz Ali
ObjD
OOD
35
0
0
01 Nov 2022
Point-Syn2Real: Semi-Supervised Synthetic-to-Real Cross-Domain Learning
  for Object Classification in 3D Point Clouds
Point-Syn2Real: Semi-Supervised Synthetic-to-Real Cross-Domain Learning for Object Classification in 3D Point Clouds
Ziwei Wang
Reza Arablouei
Jiajun Liu
Paulo Borges
G. Bishop-Hurley
Nic Heaney
3DPC
24
2
0
31 Oct 2022
Previous
123...1314151617
Next