Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.09883
Cited By
v1
v2 (latest)
Swin Transformer V2: Scaling Up Capacity and Resolution
18 November 2021
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
Yixuan Wei
Jia Ning
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (14834★)
Papers citing
"Swin Transformer V2: Scaling Up Capacity and Resolution"
50 / 840 papers shown
Title
DETRs with Collaborative Hybrid Assignments Training
Zhuofan Zong
Guanglu Song
Yu Liu
ViT
134
330
0
22 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
126
141
0
22 Nov 2022
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
Haram Choi
Jeong-Sik Lee
Jihoon Yang
ViT
77
82
0
21 Nov 2022
Crowdsensing-based Road Damage Detection Challenge (CRDDC-2022)
Deeksha M. Arya
Hiroya Maeda
S. Ghosh
Durga Toshniwal
Hiroshi Omata
Takehiro Kashiyama
Osaka University of Economics
70
43
0
21 Nov 2022
Blind Knowledge Distillation for Robust Image Classification
Timo Kaiser
Lukas Ehmann
Christoph Reinders
Bodo Rosenhahn
NoLa
62
13
0
21 Nov 2022
EHSNet: End-to-End Holistic Learning Network for Large-Size Remote Sensing Image Semantic Segmentation
Wei Chen
Yansheng Li
Bo Dang
Yongjun Zhang
87
3
0
21 Nov 2022
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Maoyuan Ye
Jing Zhang
Shanshan Zhao
Juhua Liu
Tongliang Liu
Bo Du
Dacheng Tao
151
77
0
19 Nov 2022
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
155
15
0
19 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
123
97
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
97
42
0
17 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
226
729
0
14 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
180
698
0
10 Nov 2022
OneFormer: One Transformer to Rule Universal Image Segmentation
Jitesh Jain
Jiacheng Li
M. Chiu
Ali Hassani
Nikita Orlov
Humphrey Shi
ViT
78
348
0
10 Nov 2022
Demystify Transformers & Convolutions in Modern Image Deep Networks
Jifeng Dai
Min Shi
Weiyun Wang
Sitong Wu
Linjie Xing
...
Lewei Lu
Jie Zhou
Xiaogang Wang
Yu Qiao
Xiao-hua Hu
ViT
80
11
0
10 Nov 2022
Efficient Image Generation with Variadic Attention Heads
Steven Walton
Ali Hassani
Xingqian Xu
Zhangyang Wang
Humphrey Shi
ViT
84
23
0
10 Nov 2022
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Qiang Chen
Jian Wang
Chuchu Han
Shangang Zhang
Zexian Li
...
Haocheng Feng
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
ViT
VLM
90
45
0
07 Nov 2022
Late Fusion with Triplet Margin Objective for Multimodal Ideology Prediction and Analysis
Changyuan Qiu
Winston Wu
Xinliang Frederick Zhang
Lu Wang
55
1
0
04 Nov 2022
Could Giant Pretrained Image Models Extract Universal Representations?
Yutong Lin
Ze Liu
Zheng Zhang
Han Hu
Nanning Zheng
Stephen Lin
Yue Cao
VLM
106
9
0
03 Nov 2022
Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning
Yixuan Pei
Zhiwu Qing
Jun Cen
Xiang Wang
Shiwei Zhang
Yaxiong Wang
Mingqian Tang
Nong Sang
Xueming Qian
56
13
0
02 Nov 2022
State-of-the-art Models for Object Detection in Various Fields of Application
S. A. G. Naqvi
Syed Shahnawaz Ali
ObjD
OOD
125
0
0
01 Nov 2022
Point-Syn2Real: Semi-Supervised Synthetic-to-Real Cross-Domain Learning for Object Classification in 3D Point Clouds
Ziwei Wang
Reza Arablouei
Jiajun Liu
Paulo Borges
G. Bishop-Hurley
Nic Heaney
3DPC
36
2
0
31 Oct 2022
The Curious Case of Benign Memorization
Sotiris Anagnostidis
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
AAML
132
10
0
25 Oct 2022
BARS: A Benchmark for Airport Runway Segmentation
Wenhui Chen
Zhijiang Zhang
Liang Yu
Yichun Tai
152
11
0
24 Oct 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chi Zhang
Lu Zhou
Lei Wang
Zaiyan Dai
Jun Yang
ViT
123
27
0
22 Oct 2022
Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets
Xiangyu Chen
Qinghao Hu
Kaidong Li
Cuncong Zhong
Guanghui Wang
ViT
81
13
0
22 Oct 2022
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
133
38
0
19 Oct 2022
A Tri-Layer Plugin to Improve Occluded Detection
Guanqi Zhan
Weidi Xie
Andrew Zisserman
75
20
0
18 Oct 2022
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation
Rui Li
Weihua Li
Yi Yang
Hanyu Wei
Jianhua Jiang
Quan-wei Bai
DiffM
142
11
0
18 Oct 2022
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
132
474
0
17 Oct 2022
2nd Place Solution to Google Universal Image Embedding
Xiaolong Huang
Qiankun Li
SSL
89
2
0
17 Oct 2022
Probabilistic Integration of Object Level Annotations in Chest X-ray Classification
Tom van Sonsbeek
Xiantong Zhen
Dwarikanath Mahapatra
M. Worring
67
14
0
13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
107
51
0
13 Oct 2022
S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces
Eric N. D. Nguyen
Karan Goel
Albert Gu
Gordon W. Downs
Preey Shah
Tri Dao
S. Baccus
Christopher Ré
VLM
93
40
0
12 Oct 2022
How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
Jonas Geiping
Micah Goldblum
Gowthami Somepalli
Ravid Shwartz-Ziv
Tom Goldstein
A. Wilson
107
43
0
12 Oct 2022
Match Cutting: Finding Cuts with Smooth Visual Transitions
Boris Chen
Amir Ziai
Rebecca Tucker
Yuchen Xie
VGen
100
14
0
11 Oct 2022
Curved Representation Space of Vision Transformers
Juyeop Kim
Junha Park
Songkuk Kim
Jongseok Lee
ViT
75
7
0
11 Oct 2022
Rethinking the Detection Head Configuration for Traffic Object Detection
Yi Shi
Jiang Wu
Shixuan Zhao
Gangyao Gao
T. Deng
Hongmei Yan
ObjD
82
5
0
08 Oct 2022
Humans need not label more humans: Occlusion Copy & Paste for Occluded Human Instance Segmentation
Evan Ling
De-Kai Huang
Minhoe Hur
111
5
0
07 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
118
66
0
04 Oct 2022
Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration
Sixiang Chen
Tian-Chun Ye
Yun-Peng Liu
Erkang Chen
ViT
66
17
0
03 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng Zhang
Chao Zhang
Hanhua Hu
114
31
0
03 Oct 2022
Dilated Neighborhood Attention Transformer
Ali Hassani
Humphrey Shi
ViT
MedIm
112
73
0
29 Sep 2022
Transfer Learning with Pretrained Remote Sensing Transformers
A. Fuller
K. Millard
J.R. Green
70
11
0
28 Sep 2022
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration
Marcos V. Conde
Ui-Jin Choi
Maxime Burchi
Radu Timofte
ViT
145
142
0
22 Sep 2022
VINet: Visual and Inertial-based Terrain Classification and Adaptive Navigation over Unknown Terrain
Tianrui Guan
Ruitao Song
Zhixian Ye
Liangjun Zhang
80
11
0
16 Sep 2022
Communication-Efficient and Privacy-Preserving Feature-based Federated Transfer Learning
Feng Wang
M. C. Gursoy
Senem Velipasalar
103
2
0
12 Sep 2022
LRT: An Efficient Low-Light Restoration Transformer for Dark Light Field Images
Shansi Zhang
Nan Meng
E. Lam
ViT
93
23
0
06 Sep 2022
A Review of Sparse Expert Models in Deep Learning
W. Fedus
J. Dean
Barret Zoph
MoE
129
154
0
04 Sep 2022
AutoPET Challenge: Combining nn-Unet with Swin UNETR Augmented by Maximum Intensity Projection Classifier
Lars Heiliger
Zdravko Marinov
Max Hasin
André Ferreira
Jana Fragemann
...
D. Kersting
Victor Alves
Rainer Stiefelhagen
Jan Egger
Jens Kleesiek
44
9
0
02 Sep 2022
AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results
Ren Yang
Radu Timofte
Xin Li
Qi Zhang
Lin Zhang
...
Yijian Zhang
Mao Ye
Dengyan Luo
Xiaofeng Pan
L. Peng
SupR
140
30
0
23 Aug 2022
Previous
1
2
3
...
14
15
16
17
Next