Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.15808
Cited By
CvT: Introducing Convolutions to Vision Transformers
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CvT: Introducing Convolutions to Vision Transformers"
50 / 819 papers shown
Title
Sequence and Circle: Exploring the Relationship Between Patches
Zhengyang Yu
Jochen Triesch
ViT
31
0
0
18 Oct 2022
On effects of Knowledge Distillation on Transfer Learning
Sushil Thapa
24
1
0
18 Oct 2022
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Haoran You
Zhanyi Sun
Huihong Shi
Zhongzhi Yu
Yang Katie Zhao
Yongan Zhang
Chaojian Li
Baopu Li
Yingyan Lin
ViT
25
81
0
18 Oct 2022
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
35
4
0
15 Oct 2022
Optimizing Vision Transformers for Medical Image Segmentation
Qianying Liu
Chaitanya Kaul
Jun Wang
Christos Anagnostopoulos
Roderick Murray-Smith
F. Deligianni
ViT
MedIm
24
19
0
14 Oct 2022
MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images
Weiming Li
Lihui Xue
Xueqian Wang
Gang Li
ViT
16
12
0
14 Oct 2022
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
Hyeong Kyu Choi
Joonmyung Choi
Hyunwoo J. Kim
ViT
33
35
0
14 Oct 2022
How to Train Vision Transformer on Small-scale Datasets?
Hanan Gani
Muzammal Naseer
Mohammad Yaqub
ViT
20
51
0
13 Oct 2022
FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis via Stacked Transformers
Yitian Liu
Zheng Lian
43
13
0
12 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
28
57
0
12 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
27
17
0
11 Oct 2022
Coded Residual Transform for Generalizable Deep Metric Learning
Shichao Kan
Yixiong Liang
Min Li
Yigang Cen
Jianxin Wang
Z. He
34
3
0
09 Oct 2022
Flexible Alignment Super-Resolution Network for Multi-Contrast MRI
Yiming Liu
Mengxi Zhang
Weiqin Zhang
Bo Jiang
Bo Hou
Dan Liu
Jie Chen
Heqing Lian
MedIm
33
1
0
07 Oct 2022
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
Yen-Cheng Liu
Chih-Yao Ma
Junjiao Tian
Zijian He
Z. Kira
128
47
0
07 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
Chenglin Yang
Siyuan Qiao
Qihang Yu
Xiaoding Yuan
Yukun Zhu
Alan Yuille
Hartwig Adam
Liang-Chieh Chen
ViT
MoE
39
59
0
04 Oct 2022
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling
Yunsung Lee
Gyuseong Lee
Kwang-seok Ryoo
Hyojun Go
Jihye Park
Seung Wook Kim
32
5
0
04 Oct 2022
Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration
Sixiang Chen
Tian-Chun Ye
Yun-Peng Liu
Erkang Chen
ViT
34
15
0
03 Oct 2022
Attention Distillation: self-supervised vision transformer students need more guidance
Kai Wang
Fei Yang
Joost van de Weijer
ViT
30
16
0
03 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
61
105
0
30 Sep 2022
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
106
88
0
30 Sep 2022
Effective Vision Transformer Training: A Data-Centric Perspective
Benjia Zhou
Pichao Wang
Jun Wan
Yan-Ni Liang
Fan Wang
26
5
0
29 Sep 2022
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention
Xiangcheng Liu
Tianyi Wu
Guodong Guo
ViT
48
26
0
28 Sep 2022
Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection
Neelu Madan
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
Kamal Nasrollahi
Fahad Shahbaz Khan
T. Moeslund
M. Shah
ViT
MedIm
264
62
0
25 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
39
21
0
21 Sep 2022
Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions
Dong-Ming Zhang
Yi-Mou Lin
Hao Chen
Zhuotao Tian
Xin Yang
Jinhui Tang
Kwang-Ting Cheng
VLM
35
11
0
21 Sep 2022
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
Hubert Leterme
K. Polisano
V. Perrier
Alahari Karteek
FAtt
38
2
0
19 Sep 2022
A Mosquito is Worth 16x16 Larvae: Evaluation of Deep Learning Architectures for Mosquito Larvae Classification
Aswin Surya
David B. Peral
Austin VanLoon
A. Rajesh
MedIm
8
2
0
16 Sep 2022
Transformer based Fingerprint Feature Extraction
Saraansh Tandon
A. Namboodiri
ViT
39
8
0
08 Sep 2022
Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations
Vadim Tschernezki
Iro Laina
Diane Larlus
Andrea Vedaldi
189
185
0
07 Sep 2022
Fusion of Satellite Images and Weather Data with Transformer Networks for Downy Mildew Disease Detection
William Maillet
Maryam Ouhami
A. Hafiane
ViT
MedIm
22
6
0
06 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLM
SSL
27
2
0
06 Sep 2022
ELMformer: Efficient Raw Image Restoration with a Locally Multiplicative Transformer
Jiaqi Ma
Shengyuan Yan
Lefei Zhang
Guoli Wang
Qian Zhang
41
8
0
31 Aug 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
31
9
0
31 Aug 2022
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
28
2
0
30 Aug 2022
Adaptive Perception Transformer for Temporal Action Localization
Yizheng Ouyang
Tianjin Zhang
Weibo Gu
Hongfa Wang
21
3
0
25 Aug 2022
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
Mocho Go
Hideyuki Tachibana
ViT
37
9
0
24 Aug 2022
Efficient Attention-free Video Shift Transformers
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
ViT
29
1
0
23 Aug 2022
FocusFormer: Focusing on What We Need via Architecture Sampler
Jing Liu
Jianfei Cai
Bohan Zhuang
40
7
0
23 Aug 2022
DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection
Jingyu Lin
Jie Jiang
Y. Yan
Chunchao Guo
Hongfa Wang
Wei Liu
Hanzi Wang
ViT
31
3
0
21 Aug 2022
Improved Image Classification with Token Fusion
Keong-Hun Choi
Jin-Woo Kim
Yaolong Wang
J. Ha
ViT
24
0
0
19 Aug 2022
Learning Spatial-Frequency Transformer for Visual Object Tracking
Chuanming Tang
Tianlin Li
Yuanchao Bai
Zhe Wu
Jianlin Zhang
Yongmei Huang
ViT
37
43
0
18 Aug 2022
Conviformers: Convolutionally guided Vision Transformer
Mohit Vaishnav
Thomas Fel
I. F. Rodriguez
Thomas Serre
ViT
38
1
0
17 Aug 2022
Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
Xiulong Yang
Sheng-Min Shih
Yinlin Fu
Xiaoting Zhao
Shihao Ji
DiffM
33
56
0
16 Aug 2022
Flow-Guided Transformer for Video Inpainting
Kaiwen Zhang
Jingjing Fu
Dong Liu
ViT
35
68
0
14 Aug 2022
Class-attention Video Transformer for Engagement Intensity Prediction
Xusheng Ai
Victor S. Sheng
Chunhua Li
Zhiming Cui
ViT
17
5
0
12 Aug 2022
Deep is a Luxury We Don't Have
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
26
2
0
11 Aug 2022
DropKey
Bonan li
Yinhan Hu
Xuecheng Nie
Congying Han
Xiangjian Jiang
Tiande Guo
Luoqi Liu
20
11
0
04 Aug 2022
Maintaining Performance with Less Data
Dominic Sanderson
Tatiana Kalgonova
33
1
0
03 Aug 2022
Global-Local Self-Distillation for Visual Representation Learning
Tim Lebailly
Tinne Tuytelaars
SSL
30
6
0
29 Jul 2022
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Yongming Rao
Wenliang Zhao
Yansong Tang
Jie Zhou
Ser-Nam Lim
Jiwen Lu
ViT
22
251
0
28 Jul 2022
Previous
1
2
3
...
9
10
11
...
15
16
17
Next